This may be a lame question, but I am really confused with these two. I know signals are used to do some task when something has happened. But what about celery? In the documentation it says:
Celery is an asynchronous task queue/job queue based on distributed message passing.
Will someone please explain to me of what celery is? What's the difference between these two and when to use them? Will be much appreciated! Thank you.
Fist of all django signals are synchronous. For example if you have a signal which handle before_save action of SomeModel. And you have ... say ... view function like this:
def some_view(requrest):
# do some staff
SomeModel(some_field="Hello").save()
# do other stuff
Time line of your code will be like so:
Do some stuff
Do SomeModel(some_field="Hello").save() call:
Do some django stuff
Execute signal before_save handler just before actual save
Do actual save to DB
Do other stuff
Signals work on same instance of python interpreter (in other words in same process of your OS).
Celery provides asynchronous tasks. That tasks MAY work like django signals (e.g. celery_task(arg1="Hello")). But common case is async calls:
celery_task.delay(arg1="Hello")
This is not a simple function call. This function will be executed in other python process (celery worker). And after this call you can decide: do you want to wait for result of this function? or you keep going with your code? or do you want something tricky?
Celery is very handy in case you want do some background or scheduled tasks like resize images, decode videos, update facebook status, etc.
Related
How can I send all resource units from a resource pool to maintenance or break whenever another event in another block occurs?
I'm looking for something like: resourcePool.startMainteinance()
and write it inside a "On start" box in some other block of the flowchart. Of course then I would need to end the maintenance with something like resourcePool.stopMainteinance() and resume all the tasks the resource units were executing.
Any idea? or some idea to pause manually all resources from executing their task and then resume them?
Note: suspending the agent that seized in the size block with the code SizeBlock.suspend() and SizeBlock.resume() is not an option because the resources have preparation tasks and those tasks also need to be paused.
Thank you!
You should use the Downtime block that is designed for this setup.
You can control it any way you like. In your case, myDowntimeblock.startTask(someAgent) and stopTask(sameAgent) work.
Also check the example model named "Maintenance of a Coffee machine": It shows all the other ways of using the block.
Recently, I am studying the problem of task scheduling in Flink. My purpose is to schedule subtasks to a slot of the specified node according to my own needs by modifying some source codes of the scheduling part. Through remote debugging and checking the source code, I found the following method call stack, most of which I can't understand (the comments are a little less), especially in this method: org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateMultiTaskSlot. I guess the code that allocates slots to tasks is around here. Because it is too difficult to read the source code, I have to ask you for help. Of course, if there is a better way to achieve my needs, please specify one or two. Sincerely look forward to your reply! Thank you very much!!!
The method call stack is as follows(The version of Flink I use is 1.11.1):
org.apache.flink.runtime.jobmaster.JobMaster#startJobExecution
org.apache.flink.runtime.jobmaster.JobMaster#resetAndStartScheduler
org.apache.flink.runtime.jobmaster.JobMaster#startScheduling
org.apache.flink.runtime.scheduler.SchedulerBase#startScheduling
org.apache.flink.runtime.scheduler.DefaultScheduler#startSchedulingInternal
org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy#startScheduling
(This is like the method call chain of PipelinedRegionSchedulingStrategy class. In order to simply write it as the method call chain of EagerSchedulingStrategy class, it should have no effect)
org.apache.flink.runtime.scheduler.strategy.EagerSchedulingStrategy#allocateSlotsAndDeploy
org.apache.flink.runtime.scheduler.DefaultScheduler#allocateSlotsAndDeploy
org.apache.flink.runtime.scheduler.DefaultScheduler#allocateSlots
org.apache.flink.runtime.scheduler.DefaultExecutionSlotAllocator#allocateSlotsFor
org.apache.flink.runtime.executiongraph.SlotProviderStrategy.NormalSlotProviderStrategy#allocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSlotInternal
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#internalAllocateSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateSharedSlot
org.apache.flink.runtime.jobmaster.slotpool.SchedulerImpl#allocateMultiTaskSlot
(I feel that this is the key to allocate slot for subtask, that is, execution vertex, but there is no comment, and I don't understand the process idea, so I can't understand it.)
What is use case of thenAcceptAsync? There is an explanation on https://lettuce.io/core/release/reference/#asynchronous-api, but I do not get it.
I thought thenAccept is already handling the event async and it is not blocking the main thread. In what use cases do I need thenAcceptAsync? It'd be great if there is an example.
thenAccept is executed in the event loop of netty, if you are calling a long/blocking operation using thenAccept you are going to block the event loop.
So in this case you can run a long running/blocking part of your code in a separate executor to keep event loop free.
I am working on a Perl script which does some periodic processing based on file-system contents.
The overall structure is like this:
# ... initialization...
while(1) {
# ... scan filesystem, perform actions depending on changes detected ...
sleep 5;
}
I would like to add the ability to input some data into this process by means of exposing an interface through HTTP. E.g. I would like to add an endpoint to skip the sleep, but also some means to input data that is processed in the next iteration. Additionally, I would like to be able to query some of the program's status through HTTP (i.e. a simple fork() to run the webserver-part in a separate process is insufficient?)
So far I have already used the Dancer2 framework once but it has a start; call that blocks and thus does not allow any other tasks (like my loop) to run. Additionally, I could of course move the code which is currently inside the loop to an endpoint exposed through Dancer2 but then I would need to call that periodically (though an external program?) which seems to be quite an obscure indirection compared to just having the webserver-part running in background.
Is it possible to unobtrusively (i.e. without blocking the program) add a REST-server capability to a Perl script? If yes: Which modules would be used for the purpose? If no: Should I really implement an external process to periodically invoke a certain endpoint or pursue a different solution altogether?
(I have tried to add a dancer2 tag, but could not do so due to insufficient reputation. Do not be mislead by this: I have so far only tried with Dancer2 not the Dancer (v.1))
You could try to launch your processing loop in a background thread, before you run start;.
See man perlthrtut
You probably want use threads::shared; to declare some variables shared between the REST part and the background thread. Or use dedicated queues/event mechanisms.
I'm trying to learn what Celery is & does and I've gone through some basic tutorials like first steps with Celery and this one, and I have a few questions about Celery:
In the (first steps) tutorial, you start a Celery worker and then you can basically just open an interpreter & call the task defined as:
>>> from tasks import add
>>> add.delay(4, 4)
So my questions are:
What's happening here? add() is a method that we wrote in the tasks file and we're calling add.delay(4,4). So we're calling a method over a method!?
How does this work? How does the 'delay' method get added to 'add'?
Does the Celery worker do the work of adding 4+4? As opposed to the work being done by the caller of that method? - like it would have been if I had just defined a method called add in the interpreter and just executed
add(4,4)
If the answer to 3 is yes, then how does Celery know it has to do some work? All we're doing is - importing a method from the module we wrote and call that method. How does control get passed to the Celery worker?
Also, while answering #4, it'd also be great if you could tell me how you know this. I'd be very curious to know if these things are documented somewhere that I'm missing/failing to understand it, and how I could have known the answer. Thanks much in advance!
What's happening here? add() is a method that we wrote in the tasks file and we're calling add.delay(4,4). So we're calling a method over a method!?
Everything is an object in Python. Everything has properties. Functions/methods also have properties. For example:
def foo(): pass
print(foo.__name__)
This is nothing special syntax-wise.
How does this work? How does the delay method get added to add?
The #app.task decorator does that.
Does the Celery worker do the work of adding 4+4? As opposed to the work being done by the caller of that method?
Yes, the worker does that. Otherwise this would be pretty nonsensical. You're passing two arguments (4 and 4) to the Celery system which passes them on to the worker, which does the actual work, in this case addition.
If the answer to 3 is yes, then how does Celery know it has to do some work? All we're doing is - importing a method from the module we wrote and call that method. How does control get passed to the Celery worker?
Again, the #app.task decorator abstracts a lot of magic here. This decorator registers the function with the celery worker pool. It also adds magic properties to the same method that allow you to call that function in the celery worker pool, namely delay. Imagine this instead:
def foo(): pass
celery.register_worker('foo', foo)
celery.call('foo')
The decorator is essentially just doing that, just without you having to repeatedly write foo in various ways. It's using the function itself as identifier for you, purely as syntactic sugar so you don't have to distinguish much between foo() and 'foo' in your code.