I am trying to create a single metrics which tells that redshift performance today is better than yesterday.
I know there are several metrics available to measure performance individually:
https://docs.amazonaws.cn/en_us/redshift/latest/mgmt/performance-metrics-perf.html
CPU utilization
Percentage disk space used
Database connections
Health status
Query duration
Query throughput
Concurrency scaling activity
Is it possible to have a single metrics which calculates performance based on all the above factors?
Related
Need to calculate the average number of cores and memory used over a period of time (like say a month) for a particular namespace in K8s, how can we go about doing this?
We want to calculate the cost for each namespace, we did try the Kubecost tool in AKS, but it didn't match the cost shown on the Azure Cost dashboard, in fact, it was way more than the actual cost.
We have configured to use 2 metrics for HPA
CPU Utilization
App specific custom metrics
When testing, we observed the scaling happening, but calculation of no.of replicas is not very clear. I am not able to locate any documentation on this.
Questions:
Can someone point to documentation or code on the calculation part?
Is it a good practice to use multiple metrics for scaling?
Thanks in Advance!
From https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-the-horizontal-pod-autoscaler-work
If multiple metrics are specified in a HorizontalPodAutoscaler, this calculation is done for each metric, and then the largest of the desired replica counts is chosen. If any of those metrics cannot be converted into a desired replica count (e.g. due to an error fetching the metrics from the metrics APIs), scaling is skipped.
Finally, just before HPA scales the target, the scale recommendation is recorded. The controller considers all recommendations within a configurable window choosing the highest recommendation from within that window. This value can be configured using the --horizontal-pod-autoscaler-downscale-stabilization-window flag, which defaults to 5 minutes. This means that scaledowns will occur gradually, smoothing out the impact of rapidly fluctuating metric values
Horizontal scaling means that we scale by adding more machines into the pool of resources. Still, there is a choice of how much power (CPU, RAM) each node in the cluster will have.
When cluster managed with Kubernetes it is extremely easy to set any CPU and memory limit for Pods. How to choose the optimal CPU and memory size for cluster nodes (or Pods in Kubernetes)?
For example, there are 3 nodes in a cluster with 1 vCPU and 1GB RAM each. To handle more load there are 2 options:
Add the 4th node with 1 vCPU and 1GB RAM
Add to each of the 3 nodes more power (e.g. 2 vCPU and 2GB RAM)
A straightforward solution is to calculate the throughput and cost of each option and choose the cheaper one. Are there any more advanced approaches for choosing the compute resources of the nodes in a cluster with horizontal scalability?
For this particular example I would go for 2x vCPU instead of another 1vCPU node, but that is mainly cause I believe running OS for anything serious on a single vCPU is just wrong. System to behave decently needs 2+ cores available, otherwise it's too easy to overwhelm that one vCPU and send the node into dust. There is no ideal algorithm for this though. It will depend on your budget, on characteristics of your workloads etc.
As a rule of thumb, don't stick to too small instances as you have a bunch of stuff that has to run on them always, regardless of their size and the more node, the more overhead. 3x 4vCpu+16/32GB RAM sounds like nice plan for starters, but again... it depends on what you want, need and can afford.
The answer is related to such performance metrics as latency and throughput:
Latency is a time interval between sending request and receiving response.
Throughput is a request processing rate (requests per second).
Latency has influence on throughput: bigger latency = less throughput.
If a business transaction consists of multiple sequential calls of the services that can't be parallelized, then compute resources (CPU and memory) has to be chosen based on the desired latency value. Adding more instances of the services (horizontal scaling) will not have any positive influence on the latency in this case.
Adding more instances of the service increases throughput allowing to process more requests in parallel (if there are no bottlenecks).
In other words, allocate CPU and memory resources so that service has desired response time and add more service instances (scale horizontally) to handle more requests in parallel.
I'm investigating an approximately 3 hour period of increased query latency on a production Postgres RDS instance (m4.xlarge, 400 GiB of gp2 storage).
The driver seems to be a spike in both read and write disk latencies: I see them going from a baseline of ~0.0005 up to a peak of 0.0136 write latency / 0.0081 read latency.
I also see an increase in disk queue depth from a baseline of around 2, to a peak of 14.
When there's a spike in disk latencies, I generally expect to see an increase in data being written to disk. But read iOPS, write iOPS, read throughput, and write throughput all went down (by approximately 50%) during the time when latency was elevated.
I also have server-side metrics on the total query volume I'm sending (measured in both queries per second and amount of data written: this is a write-heavy workload), and those metrics were flat during this time period.
I'm at a loss for what to investigate next. What are possible reasons that disk latency could increase while iOPs go down?
Athena has some default service limits that can help ~ cap the cost from accidental "runaway" queries on a large data lake in S3. They are not great (based on ~ time, not volume of data scanned), but it's still helpful.
What about Redshift Spectrum?
What mechanisms does it provide can be easily used to cap cost or mitigate the risk of "accidentally" scanning too much data in a single runaway query against S3? What's a good way of tackling this problem?
Amazon Redshift allows you to apply granular controls over Spectrum query execution using WLM Query Monitoring Rules.
There are 2 Spectrum metrics available: Spectrum scan size (Number of mb scanned by the query) and Spectrum scan row count (Number of rows scanned by the query).
You can also use Query execution time to enforce a maximum duration but this will apply to all query types not just Spectrum.
Please note that these are sampled metrics. Queries are not aborted at precisely the point when they exceed the rule, they are aborted at the next sample interval.
If you have been running Spectrum queries on your cluster already you can get started with QMR by using our script wlm_qmr_rule_candidates to generate candidate rules. The generated rules are based on the 99th percentiles for each metric.