Burst limit of IBM API Connect - ibm-cloud

I found "rate limit" and "burst limit" at Design section of API Designer,
What is the difference of them?
Rate limit can be set at second, minute, hour, day an week time interval.
On the other hand, burst limit can be set only second and minute time interval.
Does it mean same to set 1/1sec rate limit and to set 1/1sec burst limit?

Different Plans can have differing rate limits, both between operations and for the overall limit. This is useful for providing differing levels of service to customers. For example, a "Demo Plan" might enforce a rate limit of ten calls per minute, while a "Full Plan" might permit up to 1000 calls per second.
You can apply burst limits to your Plans, to prevent usage spikes that might damage infrastructure. Multiple burst limits can be set per Plan, at second and minute time intervals.
That said, these two parameters have a different meaning and could be used together. E.g.: I want to permit a total of 1000 calls per hour (rate limit) and a maximum spike of 50 calls per second (burst limit).

Rate limit enforce how many calls (total) are possible for a given time frame. After that the calls are not possible anymore. This is to create staged plans with different limits and charges (like e.g. entry or free, medium, enterprise).
Burst limits are used to manage, e.g., system load by capping the maximum calls for a moment (hence seconds or minutes), to prevent usage spikes. They can be used to make sure the allowed number of API calls (the rate limit) is evenly spread across the set time frame (day, week, month). They can also be used to protect the backend system from overloading.
So you could set a rate limit of 1000 API calls for a week and the burst limit to 100 calls a minute. If there were 10 "heavy" minutes, the entire rate would have been consumed. An user could also use 100+ calls per day to reach the 1000 calls a week.

Related

Graph Of utilization

My problem is as follows: I would like to create a graph of the percentage use of boxes over 24 hours. However, the box.utilization() function is cumulative, so I tried to solve the problem by creating a dataset that collects the values every hour and an event that resets the utilization so that the next hour is not affected by the previous hour's utilization.
(I attach a picture of the graph I created).
Is there a more efficient way?
I have faced the same issue before. Here is how I handled it:
Instead of cumulative utilization, I calculate the maximum hourly utilization. That is, I record the number of seized resource for every minute and get an array of 60 elements. Then divide the maximum number in that array by the total number of resources available. An example:
I have 100 machines
During an hour, maximum of 60 of them were busy
60/100= 60% maximum utilization during that hour
Then I plot these for each hour.

What period is P99 or P95 calculated over?

When committing to/setting SLAs for a service, what time period should the SLA be calculated over?
For example, if I wanted all the services in my organization to commit to P95 latency, and one of the services commits to 500ms, what is the time window - because the P95 will be different based on the time window we look at.
Depends on in what cycles your latency fluctuates.
No daily / hourly peaks? A couple thousand samples do just fine.
Daily fluctuations (e.g. peak usage, concurrent backups etc.)? Then you will need to measure at least a whole day.
Weekly fluctuations (e.g. tied to work hours or evening activities etc.)? Then you will need to sample over a full week.
There is no strict requirement to sample everything over the chosen time window, but your time window better be representative or you may be held liable. Also make sure to be fair when you under-sample.
If you want to be on the safe side, take the worst-case-scenario in your load cycle, and within that scenario take a full minute worth of samples. That gives you a good estimate what will be held against you.

prometheus: is it possible to use event number of gauge as a counter?

I use prometheus to monitor a api service. Currently, I use a Counter to count number of requests received and a Gauge for the response time in milliseconds.
I've tried to use something like count_over_time(response_time_ms[1m]) to count requests during a time range. However, I got result that each point is value of 10.
Why this doesn't work?
count_over_time(response_time_ms[1m]) will tell you the number of samples, not the number of times your Gauge was updated within (what I assume to be) a Java process. Based on the value of 10 you're seeing, I'm assuming your scrape interval is 6 seconds.
For an explanation of why this doesn't work as you would expect it, a Gauge is simply a Java object wrapping a double value. Every time you set its value, that value changes, but nothing more. There's no count of how many times the value changed or any notification sent to Prometheus that this happened. Prometheus simply polls every 6 seconds and collects whatever value was there at the time (never the wiser that the value changed 15 times since the last time it was collected). This is why gauges are intended to measure single values that go up and down (such as memory utilization: it's now 645 MB, in 6 seconds it's 648 MB, in 12 seconds 543 MB): you know the value constantly changes, but the best you can do is sample it every now and then.
For something like request latency, you should use a Histogram: it's basically a counter for the number of observations (i.e. number of requests); a counter for the sum of all observations (i.e. how long all requests put together took); and separate counters for each bucket (i.e. how many requests took less than 1 ms; how many requests took less than 10 ms; etc.). From this you can get an accurate average over any multiple of your scrape interval (i.e. change in total time divided by change in number of requests) as well as estimates for any percentile (including the median). How precise said percentiles are depends on the bucket sizes you choose (and how well they actually match the actual measurements).
Or, if all you're interested in is the number of requests, then a counter that's incremented on every request will be enough. To adjust for counter resets (e.g. job restarts), you should use increase() rather than the simple difference suggested above:
increase(number_of_requests_total[1m])
If you want to count number of requests in some specific time from now (in last 1m in this case) just use
number_of_requests_counter - number_of_requests_counter offset 1m
If you want to have sth like requests per second, than use
rate(number_of_requests_counter[1m])
I can tell you why it's not working with your Gauge, but first of all specify what do you assign to this metric. I mean, do you assing some avarage, last response time, or some other stuff?
For response time you should use Summary or Histogram (more info here)

How can a Neural Network learn from testing outputs against external conditions which it can not directly control

In order to simplify the question and hopefully the answer I will provide a somewhat simplified version of what I am trying to do.
Setting up fixed conditions:
Max Oxygen volume permitted in room = 100,000 units
Target Oxygen volume to maintain in room = 100,000 units
Maximum Air processing cycles per sec == 3.0 cycles per second (min is 0.3)
Energy (watts) used per second is this formula : (100w * cycles_per_second)SQUARED
Maximum Oxygen Added to Air per "cycle" = 100 units (minimum 0 units)
1 person consumes 10 units of O2 per second
Max occupancy of room is 100 person (1 person is min)
inputs are processed every cycle and outputs can be changed each cycle - however if an output is fed back in as an input it could only affect the next cycle.
Lets say I have these inputs:
A. current oxygen in room (range: 0 to 1000 units for simplicity - could be normalized)
B. current occupancy in room (0 to 100 people at max capacity) OR/AND could be changed to total O2 used by all people in room per second (0 to 1000 units per second)
C. current cycles per second of air processing (0.3 to 3.0 cycles per second)
D. Current energy used (which is the above current cycles per second * 100 and then squared)
E. Current Oxygen added to air per cycle (0 to 100 units)
(possible outputs fed back in as inputs?):
F. previous change to cycles per second (+ or - 0.0 to 0.1 cycles per second)
G. previous cycles O2 units added per cycle (from 0 to 100 units per cycle)
H. previous change to current occupancy maximum (0 to 100 persons)
Here are the actions (outputs) my program can take:
Change cycles per second by increment/decrement of (0.0 to 0.1 cycles per second)
Change O2 units added per cycle (from 0 to 100 units per cycle)
Change current occupancy maximum (0 to 100 persons) - (basically allowing for forced occupancy reduction and then allowing it to normalize back to maximum)
The GOALS of the program are to maintain a homeostasis of :
as close to 100,000 units of O2 in room
do not allow room to drop to 0 units of O2 ever.
allows for current occupancy of up to 100 people per room for as long as possible without forcibly removing people (as O2 in room is depleted over time and nears 0 units people should be removed from room down to minimum and then allow maximum to recover back up to 100 as more and more 02 is added back to room)
and ideally use the minimum energy (watts) needed to maintain above two conditions. For instance if the room was down to 90,000 units of O2 and there are currently 10 people in the room (using 100 units per second of 02), then instead of running at 3.0 cycles per second (90 kw) and 100 units per second to replenish 300 units per second total (a surplus of 200 units over the 100 being consumed) over 50 seconds to replenish the deficit of 10,000 units for a total of 4500 kw used. - it would be more ideal to run at say 2.0 cycle per second (40 kw) which would produce 200 units per second (a surplus of 100 units over consumed units) for 100 seconds to replenish the deficit of 10,000 units and use a total of 4000 kw used.
NOTE: occupancy may fluctuate from second to second based on external factors that can not be controlled (lets say people are coming and going into the room at liberty). The only control the system has is to forcibly remove people from the room and/or prevent new people from coming into the room by changing the max capacity permitted at that next cycle in time (lets just say the system could do this). We don't want the system to impose a permanent reduction in capacity just because it can only support outputting enough O2 per second for 30 people running at full power. We have a large volume of available O2 and it would take a while before that was depleted to dangerous levels and would require the system to forcibly reduce capacity.
My question:
Can someone explain to me how I might configure this neural network so it can learn from each action (Cycle) it takes by monitoring for the desired results. My challenge here is that most articles I find on the topic assume that you know the correct output answer (ie: I know A, B, C, D, E inputs all are a specific value then Output 1 should be to increase by 0.1 cycles per second).
But what I want is to meet the conditions I laid out in the GOALS above. So each time the program does a cycle and lets say it decides to try increasing the cycles per second and the result is that available O2 is either declining by a lower amount than it was the previous cycle or it is now increasing back towards 100,000, then that output could be considered more correct than reducing cycles per second or maintaining current cycles per second. I am simplifying here since there are multiple variables that would create the "ideal" outcome - but I think I made the point of what I am after.
Code:
For this test exercise I am using a Swift library called Swift-AI (specifically the NeuralNet module of it : https://github.com/Swift-AI/NeuralNet
So if you want to tailor you response in relation to that library it would be helpful but not required. I am more just looking for the logic of how to setup the network and then configure it to do initial and iterative re-training of itself based on those conditions I listed above. I would assume at some point after enough cycles and different conditions it would have the appropriate weightings setup to handle any future condition and re-training would become less and less impactful.
This is a control problem, not a prediction problem, so you cannot just use a supervised learning algorithm. (As you noticed, you have no target values for learning directly via backpropagation.) You can still use a neural network (if you really insist). Have a look at reinforcement learning. But if you already know what happens to the oxygen level when you take an action like forcing people out, why would you learn such a simple facts by millions of evaluations with trial and error, instead of encoding it into a model?
I suggest to look at model predictive control. If nothing else, you should study how the problem is framed there. Or maybe even just plain old PID control. It seems really easy to make a good dynamical model of this process with few state variables.
You may have a few unknown parameters in that model that you need to learn "online". But a simple PID controller can already tolerate and compensate some amount of uncertainty. And it is much easier to fine-tune a few parameters than to learn the general cause-effect structure from scratch. It can be done, but it involves trying all possible actions. For all your algorithm knows, the best action might be to reduce the number of oxygen consumers to zero permanently by killing them, and then get a huge reward for maintaining the oxygen level with little energy. When the algorithm knows nothing about the problem, it will have to try everything out to discover the effect.

Higher concurrency vs lower concurrency

In doing a load test (by using Siege for example) for servers, is a lower concurrency number better?
What does this number signify?
The Siege docs go into detail on concurrency here: https://www.joedog.org/2012/02/17/concurrency-single-siege/
From that page:
The calculation is simple: total transactions divided by elapsed time. If we did 100 transactions in 10 seconds, then our concurrency was 10.00.
Higher concurrency measure CAN mean that your server is handling more connections faster but it can also mean that your server is falling behind on calculations and causing connections to be queued. So your concurrency measure is only valuable when taken in context of time elapsed.