I initialize the guava rate limiter as such and I expect 2 acquisition per second , ~500 ms each. However numbers are zero for first 4-5 acquires , then normal ~500 for rest , when I stop sending "ticks" and wait for 1-2 seconds , goes back to zero again for a 4-5 acquired then go back to normal ~500 ms each. I think this has to do with bursting. Question is , how can I disable bursting so I can get steady ~500 for 2/second , ~250 for 4/second, etc. Thanks.
RateLimiter rateLimiter = RateLimiter.create( 2 );
Related
I use prometheus to monitor a api service. Currently, I use a Counter to count number of requests received and a Gauge for the response time in milliseconds.
I've tried to use something like count_over_time(response_time_ms[1m]) to count requests during a time range. However, I got result that each point is value of 10.
Why this doesn't work?
count_over_time(response_time_ms[1m]) will tell you the number of samples, not the number of times your Gauge was updated within (what I assume to be) a Java process. Based on the value of 10 you're seeing, I'm assuming your scrape interval is 6 seconds.
For an explanation of why this doesn't work as you would expect it, a Gauge is simply a Java object wrapping a double value. Every time you set its value, that value changes, but nothing more. There's no count of how many times the value changed or any notification sent to Prometheus that this happened. Prometheus simply polls every 6 seconds and collects whatever value was there at the time (never the wiser that the value changed 15 times since the last time it was collected). This is why gauges are intended to measure single values that go up and down (such as memory utilization: it's now 645 MB, in 6 seconds it's 648 MB, in 12 seconds 543 MB): you know the value constantly changes, but the best you can do is sample it every now and then.
For something like request latency, you should use a Histogram: it's basically a counter for the number of observations (i.e. number of requests); a counter for the sum of all observations (i.e. how long all requests put together took); and separate counters for each bucket (i.e. how many requests took less than 1 ms; how many requests took less than 10 ms; etc.). From this you can get an accurate average over any multiple of your scrape interval (i.e. change in total time divided by change in number of requests) as well as estimates for any percentile (including the median). How precise said percentiles are depends on the bucket sizes you choose (and how well they actually match the actual measurements).
Or, if all you're interested in is the number of requests, then a counter that's incremented on every request will be enough. To adjust for counter resets (e.g. job restarts), you should use increase() rather than the simple difference suggested above:
increase(number_of_requests_total[1m])
If you want to count number of requests in some specific time from now (in last 1m in this case) just use
number_of_requests_counter - number_of_requests_counter offset 1m
If you want to have sth like requests per second, than use
rate(number_of_requests_counter[1m])
I can tell you why it's not working with your Gauge, but first of all specify what do you assign to this metric. I mean, do you assing some avarage, last response time, or some other stuff?
For response time you should use Summary or Histogram (more info here)
I have a metric in Prometheus called unifi_devices_wireless_received_bytes_total, it represents the cumulative total amount of bytes a wireless device has received. I'd like to convert this to the download speed in Mbps (or even MBps to start).
I've tried:
rate(unifi_devices_wireless_received_bytes_total[5m])
Which I think is saying: "please give me the rate of bytes received per second", over the last 5 minutes, based on the documentation of rate, here.
But I don't understand what "over the last 5 minutes" means in this context.
In short, how can I determine the Mbps based on this cumulative amount of bytes metric? This is ultimately to display in a Grafana graph.
You want rate(unifi_devices_wireless_received_bytes_total[5m]) / 1000 / 1000
But I don't understand what "over the last 5 minutes" means in this context.
It's the average over the last 5 minutes.
The rate() function returns the average per-second increase rate for the counter passed to it. The average rate is calculated over the lookbehind window passed in square brackets to rate().
For example, rate(unifi_devices_wireless_received_bytes_total[5m]) calculates the average per-second increase rate over the last 5 minutes. It returns lower than expected rate when 100MB of data in transferred in 10 seconds, because it divides those 100MB by 5 minutes and returns the average data transfer speed as 100MB/5minutes = 333KB/s instead of 10MB/s.
Unfortinately, using 10s as a lookbehind window doesn't work as expected - it is likely the rate(unifi_devices_wireless_received_bytes_total[10s]) would return nothing. This is because rate() in Prometheus expects at least two raw samples on the lookbehind window. This means that new samples must be written at least every 5 seconds or more frequently into Prometheus for [10s] lookbehind window. The solution is to use irate() function instead of rate():
irate(unifi_devices_wireless_received_bytes_total[5m])
It is likely this query would return data transfer rate, which is closer to the expected 10MBs if the interval between raw samples (aka scrape_interval) is lower than 10 seconds.
Unfortunately, it isn't recommended to use irate() function in general case, since it tends to return jumpy results when refreshing graphs on big time ranges. Read this article for details.
So the ultimate solution is to use rollup_rate function from VictoriaMetrics - the project I work on. It reliably detects spikes in counter rates by returning the minimum, maximum and average per-second increase rate across all the raw samples on the selected time range.
How long does it take to load a 64-KB program from a disk whose average seek time is 10 msec., whose rotation time is 20 msecs., and whose track holds 32-KB for a 2-KB page size?
The pages are spread randomly around the disk and the number of cylinders is so large
that the chance of two pages being on the same cylinder is negligible.
My solution ..
64 KB program will be organized into 2 tracks because of each track capacity is 32KB.
To load entire track we require 20msec. To load 2KB we require 1.25 msec.
I/O time =seek time+avg.rotation latency+transfer time
10msec+10msec+1.25msec=21.25msec
Since 64KB program is organized into 2 tracks then I/O time will be 2(21.25)=42.5 msec.
Is it correct? if so why seek time =avg rotetion latency?
As its mentioned that: the number of cylinders is so large that the chance of two pages being on the same cylinder is negligible., it simply means to load each page,we always need to move to a different cylinder. Therefore, we have to add seek timefor each page, because seek time is time it takes to move the arm over appropriate cylinder.
Number of pages = 64KB/2KB = 32
rotation time by definition is time it takes to get to the appropriate sector, once we have reached the appropriate cylinder.
So Time taken would be = 32*10*20 = 6400 msec
I found another solution .
The seek plus rotational latency is 20 msec.
For 2-KB pages, the transfer
time is 1.25 msec, for a total of 21.25 msec.
Loading 32 of these pages will
take 680 msec. For 4-KB pages, the transfer time is doubled to 2.5 msec, so
the total time per page is 22.50 msec. Loading 16 of these pages takes 360
msec.
Now I'm confused.
I found "rate limit" and "burst limit" at Design section of API Designer,
What is the difference of them?
Rate limit can be set at second, minute, hour, day an week time interval.
On the other hand, burst limit can be set only second and minute time interval.
Does it mean same to set 1/1sec rate limit and to set 1/1sec burst limit?
Different Plans can have differing rate limits, both between operations and for the overall limit. This is useful for providing differing levels of service to customers. For example, a "Demo Plan" might enforce a rate limit of ten calls per minute, while a "Full Plan" might permit up to 1000 calls per second.
You can apply burst limits to your Plans, to prevent usage spikes that might damage infrastructure. Multiple burst limits can be set per Plan, at second and minute time intervals.
That said, these two parameters have a different meaning and could be used together. E.g.: I want to permit a total of 1000 calls per hour (rate limit) and a maximum spike of 50 calls per second (burst limit).
Rate limit enforce how many calls (total) are possible for a given time frame. After that the calls are not possible anymore. This is to create staged plans with different limits and charges (like e.g. entry or free, medium, enterprise).
Burst limits are used to manage, e.g., system load by capping the maximum calls for a moment (hence seconds or minutes), to prevent usage spikes. They can be used to make sure the allowed number of API calls (the rate limit) is evenly spread across the set time frame (day, week, month). They can also be used to protect the backend system from overloading.
So you could set a rate limit of 1000 API calls for a week and the burst limit to 100 calls a minute. If there were 10 "heavy" minutes, the entire rate would have been consumed. An user could also use 100+ calls per day to reach the 1000 calls a week.
This was an exam question I could not solve, even after searching about response time.
I thought that answer should be 220, 120
Effectiveness of RR scheduling depends on two factors: choice of q, the time quantum, and the scheduling overhead s. If a system contains n processes and each request by a process consumes exactly q seconds, the response time (rt) for a request is rt= n(q+s) . This means that response is generated after spending the whole CPU burst and being scheduled to the next process. (after q+s)
Assume that an OS contains 10 identical processes that were initiated at the same time. Each process contains 15 identical requests, and each request consumes 20msec of CPU time. A request is followed by an I/O operation that consumes 10 sec. The system consumses 2msec in CPU scheduling. Calculate the average reponse time of the fisrt requests issued by each process for the following two cases:
(i) the time quantum is 20msec.
(ii) the time quantum is 10 msec.
Note that I'm assuming you meant 10ms instead of 10s for the I/O wait time, and that nothing can run on-CPU while an I/O is in progress. In real operating systems, the latter assumption is not true.
Each process should take time 15 requests * (20ms CPU + 10ms I/O)/request = 450ms.
Then, divide by the time quantum to get the number of scheduling delays, and add that to 450ms:
450ms / 20ms = 22.5 but actually it should be 23 because you can't get a partial reschedule. This gives the answer 450ms + 2ms/reschedule * 23 reschedules = 496ms.
450ms / 10ms = 45. This gives the answer 450ms + 2ms/reschedule * 45 reschedules = 540ms.