While creating capacity provider in AWS ECS. the value Target capacity % we are filling, after crossing this value our cluster scale-in, but I am curious how this value of the current cluster is calculating and if I am want to check what is the current value of the cluster where can I check this. I have not found any data in cludwatch side.
See this blog post and related documentation.
You can see the "Capacity Provider Reservation" value in CloudWatch Metrics under "AWS/ECS/ManagedScaling"
For ECS Capacity Providers using managed scaling you will have an Autoscaling group associated with the Capacity Provider. The Autoscaling group will have a Target Tracking scaling policy associated with it which tracks a metric (often CPU utilization but could be what ever is most appropriate for your solution).
A Target tracking autoscaling policy tracks a target value for the metric. When using ECS Capacity providers with managed scaling the Target Capacity % you configure for the Capacity Provider is used as the Target Value for the Target Tracking scaling policy.
So as an example if your Target Tracking autoscaling policy is tracking CPUUtilization and you specify a Target Tracking % of 60% then the Capacity Provider will work on a best efforts basis to keep aggregate CPU Utilization at 60%. This will result in a scale out event when the CPUUtilization is greater than 60% and a scale in event when it is less than 60%.
You can see the scaling events in the AWS CloudWatch Management console Alarms view as the scale out or scale in actions are triggered. You will be able to see the metric your Target Tracking autoscaling policy is tracking in the AWS loudWatch console Metrics view.
Related
We currently have a GKE environemt with several HPAs for different deployments. All of them work just fine out-of-the-box, but sometimes our users still experience some delay during peak hours.
Usually this delay is the time it takes the new instances to start and become ready.
What I'd like is a way to have an HPA that could predict usage and scale eagerly before it is needed.
The simplest implementation I could think of is just an HPA that could take the average usage of previous days and in advance (say 10 minutos earliers) scale up or down based on the historic usage for the current time-frame.
Is there anything like that in vanilla k8s or GKE? I was unable to find anything like that in GCP's docs.
If you want to scale your applications based on events/custom metrics, you can use KEDA (Kubernetes-based Event Driven Autoscaler) which support scaling based on GCP Stackdriver, Datadog or Promtheus metrics (and many other scalers).
What you need to do is creating some queries to get the CPU usage at the moment: CURRENT_TIMESTAMP - 23H50M (or the aggregated value for the last week), then defining some thresholds to scale up/down your application.
If you have trouble doing this with your monitoring tool, you can create a custom metrics API that queries the monitoring API and aggregate the values (with the time shift) before sending it to the metrics-api scaler.
HPA - How to avoid scaling-up for CPU utilization spike (not on startup)
When the business configuration is loaded for different country CPU load increases for 1min, but we want to avoid scaling-up for that 1min.
below pic, CurrentMetricValue is just current value from a matrix or an average value from the last poll to current poll duration --horizontal.-pod-autoscaler-sync-period
The default HPA check interval is 30 seconds. This can be configured through the as you mentioned by changing value of flag --horizontal-pod-autoscaler-sync-period of the controller manager.
The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager’s --horizontal-pod-autoscaler-sync-period flag.
During each period, the controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition. The controller manager obtains the metrics from either the resource metrics API (for per-pod resource metrics), or the custom metrics API (for all other metrics).
In order to change/add flags in kube-controller-manager - you should have access to your /etc/kubernetes/manifests/ directory on master node and be able to modify parameters in /etc/kubernetes/manifests/kube-controller-manager.yaml.
Note: you are not able do this on GKE, EKS and other managed clusters.
What is more I recommend increasing --horizontal-pod-autoscaler-downscale-stabilization (the replacement for --horizontal-pod-autoscaler-upscale-delay).
If you're worried about long outages I would recommend setting up a custom metric (1 if network was down in last ${duration}, 0 otherwise) and setting the target value of the metric to 1 (in addition to CPU-based autoscaling). This way:
If network was down in last ${duration} recommendation based on the custom metric will be equal to the current size of your deployment. Max of this recommendation and very low CPU recommendation will be equal to the current size of the deployment. There will be no scale downs until the connectivity is restored (+ a few minutes after that because of the scale down stabilization window).
If network is available recommendation based on the metric will be 0. Maxed with CPU recommendation it will be equal to the CPU recommendation and autoscaler will operate normally.
I think this solves your issue better than limiting size of autoscaling step. Limiting size of autoscaling step will only slow down rate at which number of pods decreases so longer network outage will still result in your deployment shrinking to minimum allowed size.
You can also use memory based scaling
Since it is not possible to create memory-based hpa in Kubernetes, it has been written a script to achieve the same. You can find our script here by clicking on this link:
https://github.com/powerupcloud/kubernetes-1/blob/master/memory-based-autoscaling.sh
Clone the repository :
https://github.com/powerupcloud/kubernetes-1.git
and then go to the Kubernetes directory. Execute the help command to get the instructions:
./memory-based-autoscaling.sh --help
Read more here: memory-based-autoscaling.
Based on this
Link, auto scaling instances or partitions are provided from service fabric.
However, what's confusing is if this can also provide auto-scaling in/out of the nodes(VMs / actual physical environment), which seems not mentioned explicitly.
Yes, you can auto scale the cluster as well, assuming that you are running in Azure. This will be done based on performance counter data. It works by defining rules on the VM scaleset.
Note that in order to automatically scale down gracefully, it's recommended you use the durability level Gold or Silver, otherwise you'll be responsible to drain the node before it's taken out of the cluster.
More info here and here.
I need a breakdown of my usage inside a single project categorized on the basis of Pods or Services or Deployments but the billing section in console doesn't seem to provide such granular information. Is it possible to get this data somehow? I want to know what was the network + compute cost on per deployment or pods.
Or maybe if it is possible to have it atleast on the cluster level? Is this breakdown available in BigQuery?
Recently it was released a new features in GKE that allows to collect metrics inside a cluster that can also be combined with the exported billing data to separate costs per project/environment, making it possible to separate costs per namespace, deployment, labels, among other criteria.
https://cloud.google.com/blog/products/containers-kubernetes/gke-usage-metering-whose-line-item-is-it-anyway
It's not possible at the moment to breakdown the billing on a pod level, services or deployment, Kubernetes Engine uses Google Compute Engine instances for nodes in the cluster. You are billed for each of those instances according to Compute Engine's pricing, until the nodes are deleted. Compute Engine resources are billed on a per-second basis with a 1 minute minimum usage cost.
You can Export Billing Data to BigQuery enables you to export your daily usage and cost estimates automatically throughout the day to a BigQuery dataset you specify. You can then access your billing data from BigQuery then you can use BigQuery queries on exported billing data to do some breakdown.
You can view your usage reports as well and estimate your kubernetes charges using the GCP Pricing Calculator. If you want to move forward you can create a PIT request as a future request
You can get this visibility with your GKE Usage Metering dataset and your BigQuery cost exports.
Cost per namespace, cost per deployment, per node can be obtained by writing queries to combine these tables. If you have labels set, you can drilldown based on labels too. It shows you what's the spend on CPU, RAM, and egress cost.
Check out economize.cloud - it integrates with your datasets and allows you to slice and dice views. For example, cost per customer or cost per service can be obtained with such granular cost data.
https://www.economize.cloud/blog/gke-usage-cost-monitoring-kubernetes-clusters/
New GCP offering: GKE Cost Allocation allows users easily and natively view and manage the cost of a GKE cluster by cluster, namespace pod labels and more right from the Billing page or export Detailed usage cost data to Big Query:
https://cloud.google.com/kubernetes-engine/docs/how-to/cost-allocations
GKE Cost Allocation is a more accurate and robust compare to GKE Usage Metering.
Kubecost provides Kubernetes cost allocation by any concept, e.g. pod, service, controller, etc. It's open source and is available for GKE, AWS/EKS, and other major providers. https://github.com/kubecost/cost-model
When I resize a replication controller using kubectl, if the cluster does not have enough resource, there will have one or more pods always in pending.
Is there has any tool will auto resize GKE cluster when the resource is running out?
I had a similar requirement (for the Go build system): wanted to know when scheduled vs. available CPU or memory was > 1, and scale out nodes when that was true (or, more accurately, when it was ~.8). There's not a built-in metric, but as you suggest you can do it with a custom metric.
This was all done in Go, but it will give you the basic idea:
Create the metrics (memory and CPU, in my case
Put values to the metrics
The key takeaway IMO is that you have to iterate over each pod in the cluster to determine how much capacity is consumed, then iterate over each node in the cluster to determine how much capacity is available. It's then just a matter of pointing your autoscaler to the custom metric(s).
Big big big thing worth noting: I ultimately determined that scaling on the built-in CPU utilization metric was just as good as (if not better than, but more on that in a bit) than the custom metric. Each pod we scheduled pegged the CPU, so when pods were maxed out so was CPU. The build-in CPU utilization metric is probably better because you don't have the latency that comes with periodically putting custom metrics.
You can turn on autoscaling for the Instance Group that your GKE nodes belong to.