We have RDS PostgreSQL instance with type db.t2.small . And we have some strange moment with cpu credit balance metrics .
CPU Credit Usage not growing but balance is down to zero . Anybody know in what could be problem? (RDS instance working fine without any problems).
I am seeing the same behavior with my T2-micro free tier RDS instance. My hypothesis right now is that the service window is when the instance is getting rebooted or hot swapped, resulting in a new instance with the default baseline number of credits. This makes Saturday night more appealing than Sunday night in order to be sure by the next business day credits re-accumulate.
From the documentation, it looks like CPU credits expire 24 hours after being earned.
CPUCreditUsage
[T2 instances] The number of CPU credits consumed by the instance. One
CPU credit equals one vCPU running at 100% utilization for one minute
or an equivalent combination of vCPUs, utilization, and time (for
example, one vCPU running at 50% utilization for two minutes or two
vCPUs running at 25% utilization for two minutes).
CPU credit metrics are available only at a 5 minute frequency. If you
specify a period greater than five minutes, use the Sum statistic
instead of the Average statistic.
Units: Count
CPUCreditBalance
[T2 instances] The number of CPU credits available for the instance to
burst beyond its base CPU utilization. Credits are stored in the
credit balance after they are earned and removed from the credit
balance after they expire. Credits expire 24 hours after they are
earned.
CPU credit metrics are available only at a 5 minute frequency.
Units: Count
https://docs.aws.amazon.com/AmazonCloudWatch/latest/monitoring/rds-metricscollected.html
Related
When choosing the Enterprise Plan for IBM Event Streams, there is a huge cost associated for Base Capacity Unit-Hour which costs more than $5K per month if I put 720 hours in it (assuming 1 month is 720 hours).
This makes it way too expensive and made me wonder if I understood correctly what "Base Capacity Unit-Hour" means.
Just got this from an IBM rep:
The price is 6.85/Base Capacity Unit, and that's by the hour. So broken down we have, 6.85 * 24 * 30 = 4932/month, which makes your estimate correct. Base Capacity Unit covers 150 MB/s(3 brokers) and 2TB storage. If you find you need to scale up, then the rate will increase from there.
I am trying to interpret the metrics fetched using Telegraf with the Kubernetes plugins on the k3s cluster. I see the results are reporting the CPU in terms of nanoseconds and memory and disk in terms of bytes. More importantly, I like to understand how the CPU usage shows in ns can be converted into %?
Below is one such example capture:
kubernetes_pod_container,container_name=telegraf-ds,host=ah-ifc2,namespace=default,node_name=ah-ifc2,pod_name=telegraf-ds-dxdhz, rootfs_available_bytes=73470144512i,logsfs_available_bytes=0i,logsfs_capacity_bytes=0i,cpu_usage_nanocores=243143i,memory_usage_bytes=0i,memory_working_set_bytes=25997312i,memory_major_page_faults=0i,rootfs_used_bytes=95850790i,logsfs_used_bytes=4096i,cpu_usage_core_nanoseconds=4301919390i,memory_rss_bytes=0i,memory_page_faults=0i,rootfs_capacity_bytes=196569534464i 1616950920000000000
Also, how any visualization tool such as Chronograf/Grafana converts these raw data in a more actionable format such as cpu%, memory/disk utilization%?
Thanks and any advice will help.
If you have a running total of the number of (nano)seconds, you can look at the derivative to figure out percentages.
Example:
At time 00:00:00 the cpu usage counter was at 1,000,000,000ns
At time 00:00:10 the cpu usage counter was at 3,000,000,000ns
From this information we can conclude that during the 10 seconds between 00:00:00 and 00:00:10 the process used the cpu for 3,000,000,00 - 1,000,000,000 = 2,000,000,000 nanoseconds.
In other words, it used the cpu for 2 seconds out of 10, giving us a cpu usage of 20%.
We have a kafka streams application with 3 pods. Application scaling is a heavy operation(because of large state) for us. So, I would like to increase/scale pod only if it absolutely necessary. For example, if the application utilization increases beyond a number for lets say 10 mins.
Again, i don't need to scale up/down my application for sudden burst(a fews seconds) of messages
Looking for something configuration like:
window : 15 mins
avergae cpu : 1000 milli
So, I would like to scale the application is the average cpu over 15 mins window is greater than 1000 milli.
You can take a look into HPA policies.
There is stabilizationWindowSeconds:
StabilizationWindowSeconds is the number of seconds for which past
recommendations should be considered while scaling up or scaling down.
StabilizationWindowSeconds must be greater than or equal to zero and
less than or equal to 3600 (one hour). If not set, use the default
values: - For scale up: 0 (i.e. no stabilization is done)
And the limits CPU average utilization can be set in metric target objects under averageUtilization.
I found "rate limit" and "burst limit" at Design section of API Designer,
What is the difference of them?
Rate limit can be set at second, minute, hour, day an week time interval.
On the other hand, burst limit can be set only second and minute time interval.
Does it mean same to set 1/1sec rate limit and to set 1/1sec burst limit?
Different Plans can have differing rate limits, both between operations and for the overall limit. This is useful for providing differing levels of service to customers. For example, a "Demo Plan" might enforce a rate limit of ten calls per minute, while a "Full Plan" might permit up to 1000 calls per second.
You can apply burst limits to your Plans, to prevent usage spikes that might damage infrastructure. Multiple burst limits can be set per Plan, at second and minute time intervals.
That said, these two parameters have a different meaning and could be used together. E.g.: I want to permit a total of 1000 calls per hour (rate limit) and a maximum spike of 50 calls per second (burst limit).
Rate limit enforce how many calls (total) are possible for a given time frame. After that the calls are not possible anymore. This is to create staged plans with different limits and charges (like e.g. entry or free, medium, enterprise).
Burst limits are used to manage, e.g., system load by capping the maximum calls for a moment (hence seconds or minutes), to prevent usage spikes. They can be used to make sure the allowed number of API calls (the rate limit) is evenly spread across the set time frame (day, week, month). They can also be used to protect the backend system from overloading.
So you could set a rate limit of 1000 API calls for a week and the burst limit to 100 calls a minute. If there were 10 "heavy" minutes, the entire rate would have been consumed. An user could also use 100+ calls per day to reach the 1000 calls a week.
plesk has a built-in health monitoring which lets you configure alarm thresholds for automatic notification. most of these thresholds are percentage-based to flag a notification if memory or cpu usage gets too high.
i'm having trouble determining how these percentages are measured. measuring for memory is easy (we're dealing with a fixed figure here) but cpu usage is more complicated on multi-proc servers.
CPU Info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 8
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 42
Stepping: 7
CPU MHz: 1600.000
BogoMIPS: 6784.52
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Now am i right in thinking, that if a single core hits the 90% then this would trigger the Alarm on the Health Monitoring?
Most of my Flags are from 80% = Yellow to 90% = Red
And its pretty much always on Red, I believe this is because its Multi Core and the health tool is working on a single core.
if i use the command TOP with Shift and I
Then i get the overall CPU process, and its nothing along the lines as to what the health monitor is showing me the total % is.
I dont know if i have picked up false information or been miss guided, But maybe someone can help steer me in the right direction, or shine a little light on it at least
:)
Thanks
After alot of posts, Troubleshooting and pestering, i finally found the answer.
In Short, YES Plesk Health Monitoring does not really account for any core >1
So in my case 6 cores, When Core 6, Flicks to 80% for 1 second, The alarm is triggered.
But when you work on a Average, the CPU does not hit 12%
I asked this, over in the official Plesk Forum, and failed to get a response.
Many Serve companys and partners of Plesk, One did respond and say its a known issue that causes alot of headacks
You can increase your Load Average, from 1 minute to 15, This will reduce the alerts alot, Or well in my case it has.
1 Minute with CPU hitting 80% = Alarm
15 Minutes, CPU hits 80% for 30 seconds, Its average on the 15 minutes is 20% = No Alarm :)