Questions about Celery - celery

I'm trying to learn what Celery is & does and I've gone through some basic tutorials like first steps with Celery and this one, and I have a few questions about Celery:
In the (first steps) tutorial, you start a Celery worker and then you can basically just open an interpreter & call the task defined as:
>>> from tasks import add
>>> add.delay(4, 4)
So my questions are:
What's happening here? add() is a method that we wrote in the tasks file and we're calling add.delay(4,4). So we're calling a method over a method!?
How does this work? How does the 'delay' method get added to 'add'?
Does the Celery worker do the work of adding 4+4? As opposed to the work being done by the caller of that method? - like it would have been if I had just defined a method called add in the interpreter and just executed
add(4,4)
If the answer to 3 is yes, then how does Celery know it has to do some work? All we're doing is - importing a method from the module we wrote and call that method. How does control get passed to the Celery worker?
Also, while answering #4, it'd also be great if you could tell me how you know this. I'd be very curious to know if these things are documented somewhere that I'm missing/failing to understand it, and how I could have known the answer. Thanks much in advance!

What's happening here? add() is a method that we wrote in the tasks file and we're calling add.delay(4,4). So we're calling a method over a method!?
Everything is an object in Python. Everything has properties. Functions/methods also have properties. For example:
def foo(): pass
print(foo.__name__)
This is nothing special syntax-wise.
How does this work? How does the delay method get added to add?
The #app.task decorator does that.
Does the Celery worker do the work of adding 4+4? As opposed to the work being done by the caller of that method?
Yes, the worker does that. Otherwise this would be pretty nonsensical. You're passing two arguments (4 and 4) to the Celery system which passes them on to the worker, which does the actual work, in this case addition.
If the answer to 3 is yes, then how does Celery know it has to do some work? All we're doing is - importing a method from the module we wrote and call that method. How does control get passed to the Celery worker?
Again, the #app.task decorator abstracts a lot of magic here. This decorator registers the function with the celery worker pool. It also adds magic properties to the same method that allow you to call that function in the celery worker pool, namely delay. Imagine this instead:
def foo(): pass
celery.register_worker('foo', foo)
celery.call('foo')
The decorator is essentially just doing that, just without you having to repeatedly write foo in various ways. It's using the function itself as identifier for you, purely as syntactic sugar so you don't have to distinguish much between foo() and 'foo' in your code.

Related

The task scheduling problem of Flink, in Flink, how to place subtasks in a slot of the specified task manager?

Recently, I am studying the problem of task scheduling in Flink. My purpose is to schedule subtasks to a slot of the specified node according to my own needs by modifying some source codes of the scheduling part. Through remote debugging and checking the source code, I found the following method call stack, most of which I can't understand (the comments are a little less), especially in this method: org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateMultiTaskSlot. I guess the code that allocates slots to tasks is around here. Because it is too difficult to read the source code, I have to ask you for help. Of course, if there is a better way to achieve my needs, please specify one or two. Sincerely look forward to your reply! Thank you very much!!!
The method call stack is as follows(The version of Flink I use is 1.11.1):
org.apache.flink.runtime.jobmaster.JobMaster#startJobExecution
org.apache.flink.runtime.jobmaster.JobMaster#resetAndStartScheduler
org.apache.flink.runtime.jobmaster.JobMaster#startScheduling
org.apache.flink.runtime.scheduler.SchedulerBase#startScheduling
org.apache.flink.runtime.scheduler.DefaultScheduler#startSchedulingInternal
org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy#startScheduling
(This is like the method call chain of PipelinedRegionSchedulingStrategy class. In order to simply write it as the method call chain of EagerSchedulingStrategy class, it should have no effect)
org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy#allocateSlotsAndDeploy
org.apache.flink.runtime.scheduler.DefaultScheduler#allocateSlotsAndDeploy
org.apache.flink.runtime.scheduler.DefaultScheduler#allocateSlots
org.apache.flink.runtime.scheduler.DefaultExecutionSlotAllocator#allocateSlotsFor
org.apache.flink.runtime.executiongraph.SlotProviderStrategy.NormalSlotProviderStrategy#allocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSlotInternal
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#internalAllocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSharedSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateMultiTaskSlot
(I feel that this is the key to allocate slot for subtask, that is, execution vertex, but there is no comment, and I don't understand the process idea, so I can't understand it.)

Understanding GLib Task and Context

I don't understand the GTask functionality? why do I need this?
In my mind it is like callback.. you set a callback to a source in some context and this callback is then called when event is happening.
In general, i'm a bit confused about what is a Context and a Task in GLib and why do we need them.
In my understanding there is a main loop (only 1?) that can run several contexts (what is a context?) and each context is related to several sources which in their turn have callbacks that are like handlers.
So can someone please make some sense for me in it all.
I don't understand the GTask functionality? why do I need this? In my mind it is like callback.. you set a callback to a source in some context and this callback is then called when event is happening.
The main functionality GTask exposes is easily and safely running a task in a thread and returning the result back to the main thread.
In general, i'm a bit confused about what is a Context and a Task in GLib and why do we need them. In my understanding there is a main loop (only 1?) that can run several contexts (what is a context?) and each context is related to several sources which in their turn have callbacks that are like handlers.
For simplicity I think its safe to consider contexts and loops the same thing and there can be multiple of them. So in order to be thread-safe the task must know which context the result is returned to.

How to use canvas without import task module in celery client?

Suppose that we have a caller A and a callee B. B implements some function and has heavy dependencies.
Typically, I don't want to import these dependencies in A to keep it lightweight.
Previously, I can happily use send_task to call B by name.
Now I have more complex logic and wanna orchestrate tasks with canvas. Following the user guide here:
signature('tasks.add', args=(2, 2), countdown=10)
I got a task NotRegistered error.
How to register task by name?
Without seeing your code, we can't actually help you. So I'll answer your question generically using the docs:
For reference, here is the example snippet from celery:
from celery import Celery
app = Celery('tasks', broker='pyamqp://guest#localhost//')
#app.task
def add(x, y):
return x + y
Note that we create the signature tasks.add because the application is named tasks in the Celery constructor and the function is named add. This task is registered because it is in a tasks.py file. So, to call a task by name we have to make sure of three things:
Is the task registered properly? We can check by looking at the output when a celery worker starts, and the worker outputs a list of registered tasks. If not, make sure the task is in a tasks.py file or a tasks module that celery knows to look for.
What is my application name? We can find this by looking for the call to the Celery constructor. It is what you pass to the -A parameter when starting a worker.
What is my task name? If no name is specified in the app.task decorator, use the function name. Otherwise use the name specified in the app.task decorator.
After ensuring (1), concatenate (2) and (3), and put a . in between them to get the task name to call.

django - difference between signals and celery

This may be a lame question, but I am really confused with these two. I know signals are used to do some task when something has happened. But what about celery? In the documentation it says:
Celery is an asynchronous task queue/job queue based on distributed message passing.
Will someone please explain to me of what celery is? What's the difference between these two and when to use them? Will be much appreciated! Thank you.
Fist of all django signals are synchronous. For example if you have a signal which handle before_save action of SomeModel. And you have ... say ... view function like this:
def some_view(requrest):
# do some staff
SomeModel(some_field="Hello").save()
# do other stuff
Time line of your code will be like so:
Do some stuff
Do SomeModel(some_field="Hello").save() call:
Do some django stuff
Execute signal before_save handler just before actual save
Do actual save to DB
Do other stuff
Signals work on same instance of python interpreter (in other words in same process of your OS).
Celery provides asynchronous tasks. That tasks MAY work like django signals (e.g. celery_task(arg1="Hello")). But common case is async calls:
celery_task.delay(arg1="Hello")
This is not a simple function call. This function will be executed in other python process (celery worker). And after this call you can decide: do you want to wait for result of this function? or you keep going with your code? or do you want something tricky?
Celery is very handy in case you want do some background or scheduled tasks like resize images, decode videos, update facebook status, etc.

Syncronous Scala Future without separate thread

I'm building a library that, as part of its functionality, makes HTTP requests. To get it to work in the multiple environments it'll be deployed in I'd like it to be able to work with or without Futures.
One option is to have the library parametrise the type of its response so you can create an instance of the library with type Future, or an instance with type Id, depending on whether you are using an asynchronous HTTP implementation. (Id might be an Identity monad - enough to expose a consistent interface to users)
I've started with that approach but it has got complicated. What I'd really like to do instead is use the Future type everywhere, boxing synchronous responses in a Future where necessary. However, I understand that using Futures will always entail some kind of threadpool. This won't fly in e.g. AppEngine (a required environment).
Is there a way to create a Future from a value that will be executed on the current thread and thus not cause problems in environments where it isn't possible to spawn threads?
(p.s. as an additional requirement, I need to be able to cross build the library back to Scala v2.9.1 which might limit the features available in scala.concurrent)
From what I understand you wish to execute something and then wrap the result with Future. In that case, you can always use Promise
val p = Promise[Int]
p success 42
val f = p.future
Hence you now have a future wrapper containing the final value 42
Promise is very well explained here .
Take a look at Scalaz version of Future trait. That's based on top of Trampoline mechanism which will be executing by the current thread unless fork or apply won't be called + that completely removes all ExecutionContext imports =)