How to configure circuit breaker in celery 5 - celery

Celery 5 introduces the concept of Ciruit Breaker into the framework here.
Anyone knows how to configure the circuit breaker ? An example will be appreciated!

I would not trust a two years old document (Last-Modified
2019-04-08) to reflect the current state of Celery. Moreover, the document contains enhancement ideas - it is not a changelog or similar.

Related

Implement a TaskQueue (like Celery) with ETA/Countdown

Many popular task queues (such as Google GAE TaskQueue, Celery) have the ETA/Countdown feature, which allows a task to be put into the queue after xxx seconds.
I am working on a project that needs a task queue with the ETA feature. However, there are some limitations that I have to use the Google Pubsub messaging system. Pubsub does not have the ETA feature. I am wondering how to implement a reliable and scalable ETA mechanism for a task queue. Both general architecture ideas and actual code samples are welcome.
Our system enqueues 600-2000 tasks/second, and about 10% of them need to have ETA. It is a distributed system and performance-critical.
I tried to trace the source code of celery, but couldn't find the actual logic of handling the ETA. It would also be good if someone can point me to the file/code of Celery that handle ETA.
I think I might have found how Celery did it. In eventlet.py, it uses eventlet's spawn_after feature to delay the worker creation "ETA" seconds.
secs = max(eta - monotonic(), 0)
g = self._spawn_after(secs, entry)

Concurrency, how to create an efficient actor setup?

Alright so I have never done intense concurrent operations like this before, theres three main parts to this algorithm.
This all starts with a Vector of around 1 Million items.
Each item gets processed in 3 main stages.
Task 1: Make an HTTP Request, Convert received data into a map of around 50 entries.
Task 2: Receive the map and do some computations to generate a class instance based off the info found in the map.
Task 3: Receive the class and generate/add to multiple output files.
I initially started out by concurrently running task 1 with 64K entries across 64 threads (1024 entries per thread.). Generating threads in a for loop.
This worked well and was relatively fast, but I keep hearing about actors and how they are heaps better than basic Java threads/Thread pools. I've created a few actors etc. But don't know where to go from here.
Basically:
1. Are actors the right way to achieve fast concurrency for this specific set of tasks. Or is there another way I should go about it.
2. How do you know how many threads/actors are too many, specifically in task one, how do you know what the limit is on number of simultaneous connections is (Im on mac). Is there a golden rue to follow? How many threads vs how large per thread pool? And the actor equivalents?
3. Is there any code I can look at that implements actors for a similar fashion? All the code Im seeing is either getting an actor to print hello world, or super complex stuff.
1) Actors are a good choice to design complex interactions between components since they resemble "real life" a lot. You can see them as different people sending each other requests, it is very natural to model interactions. However, they are most powerful when you want to manage changing state in your application, which does not seem to be the case for you. You can achieve fast concurrency without actors. Up to you.
2) If none of your operations is blocking the best rule is amount of threads = amount of CPUs. If you use a non blocking HTTP client, and NIO when writing your output files then you should be fully non-blocking on IOs and can just safely set the thread count for your app to the CPU count on your machine.
3) The documentation on http://akka.io is very very good and comprehensive. If you have no clue how to use the actor model I would recommend getting a book - not necessarily about Akka.
1) It sounds like most of your steps aren't stateful, in which case actors add complication for no real benefit. If you need to coordinate multiple tasks in a mutable way (e.g. for generating the output files) then actors are a good fit for that piece. But the HTTP fetches should probably just be calls to some nonblocking HTTP library (e.g. spray-client - which will in fact use actors "under the hood", but in a way that doesn't expose the statefulness to you).
2) With blocking threads you pretty much have to experiment and see how many you can run without consuming too many resources. Worry about how many simultaneous connections the remote system can handle rather than hitting any "connection limits" on your own machine (it's possible you'll hit the file descriptor limit but if so best practice is just to increase it). Once you figure that out, there's no value in having more threads than the number of simultaneous connections you want to make.
As others have said, with nonblocking everything you should probably just have a number of threads similar to the number of CPU cores (I've also heard "2x number of CPUs + 1", on the grounds that that ensures there will always be a thread available whenever a CPU is idle).
With actors I wouldn't worry about having too many. They're very lightweight.
If you have really no expierience in Akka try to start with something simple like doing a one-to-one actor-thread rewriting of your code. This will be easier to grasp how things work in akka.
Spin two actors at the begining one for receiving requests and one for writting to the output file. Then when request is received create an actor in request-receiver actor that will do the computation and send the result to the writting actor.

Is it mandatory to have a master actor in Akka?

I am trying to learn a bit about akka actors (in scala) and I came up with this question that I couldn't find an answer to.
Do you necessarily need to create a master actor and from there create the workers with a workerRouter?
Or can you just skip this step and go directly to create workers with a workerRouter from your Main object?
Let me know if you need any code, but I'm basically following the HelloWorld for akka.
Strictly speaking: yes. In terms of bussiness logic: no.
Akka's actors, are by design, hierarchical . That means, any actors you create will always have a "parent"/"master", if not one defined yourself, then the /user guardian actor .
However, note that this hierarchical relationship, from Akka's system point of view, concerns the actor lifecycle and child supervision. It does not care about how you wire the actors up with your messages and/or any custom lifecycle handling.
So, from the point of view of your application, you can have your worker actors run as peers with some some sort of consensus scheme. They will of course have system parent (/user if you won't define one yourself), but as long as you don't care about supervision - and if you're just starting to learn Akka you might not - everything will work fine.
Finally, note that there can be many schemes for working in a "worker pool" setup. For example, this article on work pulling might give you some inspiration on the possible solutions to such problems.

Does the mongoDB broker for Celery work by polling?

The Celery documentation for the mongoDB broker does not say whether or not it works by polling. I read in this blog post that pub/sub is possible with mongoDB, but I don't know if that's what the mongoDB broker for Celery does.
Two sub-questions:
if the broker works by polling, what is the frequency and how can I configure it?
if the broker works with tailable cursors, is it compatible with sharding (by queue name).
Thanks a lot.
I took a peek at the source code: Celery is based on Kombu, and judging from the mongoDB transport source code (kombu.transport.mongodb), the drain_events method is simply inherited from the kombu.transport.virtual.Transport class, which simply polls every second.
One can override the polling interval by setting the polling_interval attribute in the transport options (see this commit).

Is there a good open-source MongoDB Queue Implementation for the C# Driver

Not that it wouldn't be easy (or fun) enough to write one, it makes sense not to re-invent the wheel so to speak. I've had a look around at various attempts, but I don't seem to have yet come across an implementation that supports these criteria;
Simple queue OSS system with MongoDB persistence;
C# Driver (official) based (so full POCO serialization)
Tailable cursors rather than polling
handles message timeout (GC correctly)
handles consumer failure (ideally crash detecting re-insertion, but timeout with delayed re-insertion is fine) so findAndModify on complete
multiple writers, multiple consumers
threadsafe
Nice to have;
allows for (latest only) message (replace older messages in the Q)
If anyone has nice simple a library like that floating around on GitHub that I've not yet found, please speak up!
There's my little project - a .net message bus implementation that works with MS SQL queues or MongoDB (MongoDB support is a recent addition). Link: http://code.google.com/p/nginn-messagebus/ and http://nginn.org/blog for some examples.
I'm not sure if this is what you're looking for, it's also lacking in documentation and example departments and it doesn't exactly match your specs (polling instead of tailing) - but maybe it's worth giving a try. This is a publish-subscribe message bus, like NServiceBus or MassTransit - not a raw message queue.
PS I'm afraid there are mutually exclusive requirements in your specs: you can't use tailable cursor with concurrent consumers because you lose atomicity. If you want to tail a queue you should use only a single consumer.