How to put a rate limit on a celery queue? - celery

I read this in the celery documentation for Task.rate_limit:
Note that this is a per worker instance rate limit, and not a global rate limit. To enforce a global rate limit (e.g., for an API with a maximum number of requests per second), you must restrict to a given queue.
How do I put a rate limit on a celery queue?

Turns out it cant be done at queue level for multiple workers.
IT can be done at queue level for 1 worker. Or at queue level for each worker.
So if u say 10 jobs/ minute on 5 workers. Your workers will process upto 50 jobs per minute collectively.
So to have only 10 jobs running at a time you either chose one worker. Or chose 5 workers with a limit of 2/minute.
Update: How to exactly put the limit in settings/configuration:
task_annotations = {'tasks.<task_name>': {'rate_limit': '10/m'}}
or change the same for all tasks:
task_annotations = {'*': {'rate_limit': '10/m'}}
10/m means 10 tasks per minute, /s would mean per second. More details here: Task annotations setting

hey I am trying to find a way to do rate limit on queue, and I find out Celery can't do that, however Celery can control the rate per tasks, see this:
http://docs.celeryproject.org/en/latest/userguide/workers.html#rate-limits
so for a workaround, maybe you can set up one tasks per queue(which makes sense in a lot of situations), and put the limit on task.

You can set this limit in the flower > worker pane.
there is a specified blank space for entering your limit there.
The format that is suggested to be used is also like the below:
The rate limits can be specified in seconds, minutes or hours by appending “/s”, >“/m” or “/h” to the value. Tasks will be evenly distributed over the specified >time frame.
Example: “100/m” (hundred tasks a minute). This will enforce a minimum delay of >600ms between starting two tasks on the same worker instance.

Related

Google Cloud Tasks - Maximum number of tasks in a single queue?

Is there a limit set for how many tasks can be created in a single queue? I can't seem to find this info anywhere.
I will probably not execute more than 100 at a time, but I will need to have a lot more waiting in the queue.
No, there's none. The only limit mentioned is for the number of Queues that can be added (1000 default, can be increased on your quotas page).
See full details:
https://cloud.google.com/tasks/docs/quotas

Django Celery Rate Limit setting doesn't work as expected

I use below command to create one worker:
celery -A proj worker -l info --concurrency=50 -Q celery,token_1 -n token_1
And in my task, I set the rate limit to 4000/m.
However, when I start running the collection, I noticed the average task processed is just around 10-20/s (with rate limit rule 4000/m enabled).
Then, I removed the rate limit rule, now the task rates goes to around 60/s.
I am confused, since my rate limit is 4000/m, which is relatively 65/s. Why it finally goes just 10-20/s????? (I have already set 50 threads for the worker....)
You're misunderstanding how the rate limits operate in celery. 'According to the documentation on version 4.2:
The rate limits can be specified in seconds, minutes or hours by appending “/s”`, “/m” or “/h” to the value. Tasks will be evenly distributed over the specified time frame.
Example: “100/m” (hundred tasks a minute). This will enforce a minimum delay of 600ms between starting two tasks on the same worker instance.
In essence, celery was adding a forced delay between your tasks. Since each task was already processing in about 16ms (1/60 of a second), adding another 16 ms forced delay between tasks reduced the rate at which they would process.

How does jmeter starts sending requests to server

If Thread: 100, Rampup: 1 and Loop count: 1 is the configuration, how will jmeter start sending requests to the server?
Request will be sent 1 req/sec or all requests will be sent all at once to server?
JMeter will send requests as fast as it can, to wit:
It will start all threads (virtual users) you define in Thread Group within the ramp-up period (in your case - 100 threads in 1 second)
Each thread (virtual user) will start executing Samplers which are present in the Thread Group upside down (or according to the Logic Controllers)
When there are no more samplers to execute or loops to iterate the thread will be shut down
When there are no more active threads left - JMeter test will end.
With regards to requests per second - it mostly depends on your application response time, i.e.
if you have 100 virtual users and response time is 1 second - you will get 100 requests/second
if you have 100 virtual users and response time is 2 seconds - you will get 50 requests/second
if you have 100 virtual users and response time is 500 milliseconds - you will get 200 requests/second
etc.
I would recommend increasing (and decreasing) the load gradually, this way you will be able to correlate increasing load with increasing throughput/response time/number of errors, etc. while releasing all threads at once will not tell you the full story (unless you're doing a form of spike testing, in this case consider using Synchronizing Timer)
JMeter's ramp-up period set as 1 means to start all 100 threads in 1 second.
This isn't recommended settings as describe below
The ramp-up period tells JMeter how long to take to "ramp-up" to the full number of threads chosen. If 10 threads are used, and the ramp-up period is 100 seconds, then JMeter will take 100 seconds to get all 10 threads up and running. Each thread will start 10 (100/10) seconds after the previous thread was begun. If there are 30 threads and a ramp-up period of 120 seconds, then each successive thread will be delayed by 4 seconds.
Ramp-up needs to be long enough to avoid too large a work-load at the start of a test, and short enough that the last threads start running before the first ones finish (unless one wants that to happen).
Start with Ramp-up = number of threads and adjust up or down as needed.
See also Can i set ramp up period 0 in JMeter?
bear in mind that with low rampup and many threads, you may be limited by local resources, so your results may be a measurement of client capability rather than server.

Scheduling policies in Linux Kernel

Can there be more than two scheduling policies working at the same time in Linux Kernel ?
Can FIFO and Round Robin be working on the same machine ?
Yes, Linux supports no less then 4 different scheduling methods for tasks: SCHED_BATCH, SCHED_FAIR, SCHED_FIFO and SCHED_RR.
Regardless of scheduling method, all tasks also have a fixed hard priority (which is 0 for batch and fair and from 1- 99 for the RT schedulign methods of FIFO and RR). Tasks are first and foremost picked by priority - the highest priority wins.
However, with several tasks available for running with the same priority, that is where the scheduling method kicks in: A fair task will only run for its allotted weighted (with the weight coming from a soft priority called the task nice level) share of the CPU time with regard to other fair tasks, a FIFO task will run for a fixed time slice before yielding to another task (of the same priority - higher priority tasks always wins) and RR tasks will run till it blocks disregarding other tasks with the same priority.
Please note what I wrote above is accurate but not complete, because it does not take into account advance CPU reservation features, but it give the details about different scheduling method interact with each other.
yes !! now a days we have different scheduling policies at different stages in OS .. Round robin is done generally before getting the core execution ... fifo is done, at start stage of new coming process ... !!!

Least load scheduler

I'm working on a system that uses several hundreds of workers in parallel (physical devices evaluating small tasks). Some workers are faster than others so I was wondering what the easiest way to load balance tasks on them without a priori knowledge of their speed.
I was thinking about keeping track of the number of tasks a worker is currently working on with a simple counter and then sorting the list to get the worker with the lowest active task count. This way slow workers would get some tasks but not slow down the whole system. The reason I'm asking is that the current round-robin method is causing hold up with some really slow workers (100 times slower than others) that keep accumulating tasks and blocking new tasks.
It should be a simple matter of sorting the list according to the current number of active tasks, but since I would be sorting the list several times a second (average work time per task is below 25ms) I fear that this might be a major bottleneck. So is there a simple version of getting the worker with the lowest task count without having to sort over and over again.
EDIT: The tasks are pushed to the workers via an open TCP connection. Since the dependencies between the tasks are rather complex (exclusive resource usage) let's say that all tasks are assigned to start with. As soon as a task returns from the worker all tasks that are no longer blocked are queued, and a new task is pushed to the worker. The work queue will never be empty.
How about this system:
Worker reaches the end of its task queue
Worker requests more tasks from load balancer
Load balancer assigns N tasks (where N is probably more than 1, perhaps 20 - 50 if these tasks are very small).
In this system, since you are assigning new tasks when the workers are actually done, you don't have to guess at how long the remaining tasks will take.
I think that you need to provide more information about the system:
How do you get a task to a worker? Does the worker request it or does it get pushed?
How do you know if a worker is out of work, or even how much work is it doing?
How are the physical devices modeled?
What you want to do is avoid tracking anything and find a more passive way to distribute the work.