In this blog here: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#algorithm-details
There is a blurb:
For object metrics and external metrics, a single metric is fetched, which describes the object in question. This metric is compared to the target value, to produce a ratio as above. In the autoscaling/v2beta2 API version, this value can optionally be divided by the number of pods before the comparison is made.
I need to do exactly this; divide my current metric by the current number of pods.
Where can I find the specification for this API? I have googled frantically to see what the autoscaling yaml specification is to do this but I cannot find it. IE I need to write the autoscaler resource as part of our helm chart.
The specification for k8s API can be found here: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/
The above is for k8s version 1.18, you'll have to switch to the right version for you.
The spec for HPA v2beta2 would be here: https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.18/#horizontalpodautoscaler-v2beta2-autoscaling
Related
In the Kubernetes environment, I'm using Prometheus.
And there, I can view lots of metrics such as kube_node_status_allocatable, kube_resourcequota,...
I want to know what this metric means. But I can't find any documentation about it.
In the repo of Kubernetes, I found this page. https://github.com/kubernetes/kube-state-metrics/blob/main/docs/resourcequota-metrics.md
But the explanation is too short.
I want to know where can I check the official documentation about it.
I want to figure out what the type means in kube_resourcequota metric.
I already checked prometheus homepages. But It can't be found.
Do I have to dig the source codes?
do you mean what is resource quota?
https://kubernetes.io/docs/concepts/policy/resource-quotas/
You'll find kube_resourcequota is same as kubectl get resourcequotas -A.
Is there a DataDog metric to report the space used or remaining in a GCP PersistentVolume. I have found disk use metrics for the container itself, but not for a PersistentVolume.
I am working in GoogleCloudPlatform.
Found the answer:
The metric kubernetes.kubelet.volume.stats.used_bytes will allow you to get the disk usage on PersistentVolumes. This can be achieved using the tag persistentvolumeclaim on this metric.
You can view this metric and tag combination in your account here: https://app.datadoghq.com/metric/summary?filter=kubernetes.kubelet.volume.stats.used_bytes&metric=kubernetes.kubelet.volume.stats.used_bytes
Documentation on this metric can be found here: https://docs.datadoghq.com/agent/kubernetes/data_collected/#kubelet
Google publishes a tutorial for using custom metrics to drive the HorizontalPodAutoscaler here, and this tutorial contains instructions for:
Using a Kubernetes manifest to deploy the custom metrics adapter into a custom-metrics namespace.
Deploying a dummy application to generate metrics.
Configuring the HPA to use custom metrics.
We are deploying into a default cluster without any special VPC rules, and we have roughly followed the tutorial's guidance, with a few exceptions:
We're using Helm v2, and rather than grant cluster admin role to Tiller, we have granted all of the necessary cluster roles and role bindings to allow the custom-metrics-adapter-deploying Kubernetes manifest to work. We see no issues there; at least the custom metrics adapter spins up and runs.
We have defined some custom metrics that are based upon data extracted from a jsonPayload in Stackdriver logs.
We have deployed a minute-by-minute CronJob that reads the above metrics and publishes a derived metric, which is the value we want to use to drive the autoscaler. The CronJob is working, and we can see the metric in the derived metric, on a per-Pod basis, in the log metric explorer:
We're configuring the HPA to scale based on the average of the derived metric across all of the pods belonging to a stateful set (The HPA has a metrics entry with type Pods). However, the HPA is unable to read our derived metric. We see this error message:
failed to get object metric value: unable to get metric xxx_scaling_metric: no metrics returned from custom metrics API
Update
We were seeing DNS errors, but these were apparently false alarms, perhaps in the log while the cluster was spinning up.
We restarted the Stackdriver metrics adapter with the command line option --v=5 to get some more verbose debugging. We see log entries like these:
I0123 20:23:08.069406 1 wrap.go:47] GET /apis/custom.metrics.k8s.io/v1beta1/namespaces/defaults/pods/%2A/xxx_scaling_metric: (56.16652ms) 200 [kubectl/v1.13.11 (darwin/amd64) kubernetes/2e298c7 10.44.1.1:36286]
I0123 20:23:12.997569 1 translator.go:570] Metric 'xxx_scaling_metric' not found for pod 'xxx-0'
I0123 20:23:12.997775 1 wrap.go:47] GET /apis/custom.metrics.k8s.io/v1beta2/namespaces/default/pods/%2A/xxx_scaling_metric?labelSelector=app%3Dxxx: (98.101205ms) 200 [kube-controller-manager/v1.13.11 (linux/amd64) kubernetes/56d8986/system:serviceaccount:kube-system:horizontal-pod-autoscaler 10.44.1.1:36286]
So it looks to us as if the HPA is making the right query for pods-based custom metrics. If we ask the custom metrics API what data it has, and filter with jq to our metric of interest, we see:
{"kind":"MetricValueList",
"apiVersion":"custom.metrics.k8s.io/v1beta1",
"metadata: {"selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/xxx_scaling_metric"},
"items":[]}
That the items array is empty is troubling. Again, we can see data in the metrics explorer, so we're left to wonder if our CronJob app that publishes our scaling metric is supplying the right fields in order for the data to be saved in Stackdriver or exposed through the metrics adapter.
For what it's worth the resource.labels map for the time series that we're publishing in our CronJob looks like:
{'cluster_name': 'test-gke',
'zone': 'us-central1-f',
'project_id': 'my-project-1234',
'container_name': '',
'instance_id': '1234567890123456789',
'pod_id': 'xxx-0',
'namespace_id': 'default'}
We finally solved this. Our CronJob that's publishing the derived metric we want to use is getting its raw data from two other metrics that are extracted from Stackdriver logs, and calculating a new value that it publishes back to Stackdriver.
We were using the resource labels that we saw from those metrics when publishing our derived metric. The POD_ID resource label value in the "input" Stackdriver metrics we are reading is the name of the pod. However, the stackdriver custom metrics adapter at gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.0 is enumerating pods in a namespace and asking stackdriver for data associated with pods' UIDs, not their names. (Read the adapter's source code to figure this out...)
So our CronJob now builds a map of pod names to pod UIDs (which requires it to have RBAC pod list and get roles), and publishes the derived metric we use for HPA with the POD_ID set to the pod's UID instead of its name.
The reason that published examples of custom metrics for HPA (like this) work is that they use the Downward API to get a pod's UID, and provide that value as "POD_ID". In retrospect, that should have been obvious, if we had looked at how the "dummy" metrics exporters got their pod id values, but there are certainly examples (as in Stackdriver logging metrics) where POD_ID ends up being a name and not a UID.
Kubernetes versions:
v1.15.2
Scenario shows:
kubernetes v1.15.2 have added some new api versions,for example, the autoscaling/v2beta2 in the autoscaling group. But after read the HorizontalController structure in kubernetes code src\k8s.io\kubernetes\pkg\controller\podautoscaler\, All the members in HorizontalController is autoscaling/v1.
type HorizontalController struct {
scaleNamespacer scaleclient.ScalesGetter ==> autoscaling/v1
hpaNamespacer autoscalingclient.HorizontalPodAutoscalersGetter ==> autoscaling/v1
mapper apimeta.RESTMapper
replicaCalc *ReplicaCalculator
eventRecorder record.EventRecorder
downscaleStabilisationWindow time.Duration
// hpaLister is able to list/get HPAs from the shared cache from the informer passed in to
// NewHorizontalController.
hpaLister autoscalinglisters.HorizontalPodAutoscalerLister ==> autoscaling/v1
hpaListerSynced cache.InformerSynced ==> autoscaling/v1
// podLister is able to list/get Pods from the shared cache from the informer passed in to
// NewHorizontalController.
podLister corelisters.PodLister
podListerSynced cache.InformerSynced
// Controllers that need to be synced
queue workqueue.RateLimitingInterface
// Latest unstabilized recommendations for each autoscaler.
recommendations map[string][]timestampedRecommendation
}
So how kubernetes maintans the autoscaling/v2beta2 resources with HorizontalController?
In the official kubernetes documentation You can find following information:
API Object
The Horizontal Pod Autoscaler is an API resource in the Kubernetes autoscaling API group. The current stable version, which only includes support for CPU autoscaling, can be found in the autoscaling/v1 API version.
The beta version, which includes support for scaling on memory and custom metrics, can be found in autoscaling/v2beta2. The new fields introduced in autoscaling/v2beta2 are preserved as annotations when working with autoscaling/v1.
More details about the API object can be found at HorizontalPodAutoscaler Object.
Also according to kubernetes documentation about API overview under API versioning:
API versioning
To eliminate fields or restructure resource representations,
Kubernetes supports multiple API versions, each at a different API
path. For example: /api/v1 or /apis/extensions/v1beta1.
The version is set at the API level rather than at the resource or
field level to:
Ensure that the API presents a clear and consistent view of system resources and behavior.
Enable control access to end-of-life and/or experimental APIs.
The JSON and Protobuf serialization schemas follow the same guidelines
for schema changes. The following descriptions cover both formats.
So You can find all apis versions for autoscaling like v2beta2 under kubernetes/pkg/apis/autoscaling/ .
For example using HTTP GET would be like this: GET /apis/autoscaling/v2beta2
I've set up prometheus to monitor kubernetes metrics by following the prometheus documentation.
A lot of useful metrics now show up in prometheus.
However, I can't see any metrics referencing the status of my pods or nodes.
Ideally - I'd like to be able to graph the pod status (Running, Pending, CrashLoopBackOff, Error) and nodes (NodeReady, Ready).
Is this metric anywhere? If not, can I add it somewhere? And how?
The regular kubernetes setup does not expose these metrics - further discussion here.
However, another service can be used to collect these cluster level metrics: https://github.com/kubernetes/kube-state-metrics.
This currently provides node_status_ready and pod_container_restarts which sound like what I want.
I don't think such metrics exist.
You have to modify the source code to add them. Take a look at this file on how to register a metric: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/metrics/metrics.go,
and take a look at this line on how to record a metric: https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/pleg/generic.go#L180
I've found that I can monitor these metrics using heapster & snap, which is a plausible workaround for my case. Let me know if that's something you're also using and I'll give you the proper metrics to get this data.