Send kubernetes(GKE) service layer metrics to GCP Load Balancer - kubernetes

I am using GKE and have an application-app1(pod) which is exposed using NodePort and then put behind an ingress.
The ingress-controller has launched a GCP load balancer. Now, the requests coming on path /app1/ are routed to my application.
I launched the stackdriver-metrics adapter inside the cluster and then I configured an HPA which uses requests/second metrics from the load balancer. HPA gets the metrics from ExternalMetric for a particular backend name.
- external:
metricName: loadbalancing.googleapis.com|https|request_count
metricSelector:
matchLabels:
resource.labels.backend_target_name: k8s-be-30048--my-backend
targetAverageValue: 20
type: External
Everything works perfectly. Here is the problem,
Some of the other apps which are also running inside the kubernetes cluster are also calling this app1. Those other apps inside the cluster are calling the app1 by the kubernetes FQDN app1.default.svc.cluster.local and not via the load balancer route. That means these requests won't go throught the ingress loadbalancer. That will mean that these requests are not being counted by the HPA in any way.
So, that menans the total requests(Ct) coming are via LoadBalancer(C1) and via FQDN(C2), Ct = C1 + C2. My guess is that hpa will only take C1 into account and not Ct. My hpa will not scale my app accordingly because of the way metrics are being counted here. For example, if Ct is 120 but C1 is 90 then number of pods will be 3 but it should acutally be 4.
Am I wrong here to consider that requests coming via FQDN are not counted by the load balancer?
If the requests are being counted I think I will have to use something which counts requests on the pod level. Something like a prometheus middleware. Can you guys suggest anything else?

Answering following comment:
Yup, that's the obstruction. No way to forecast/relate the kind of traffic. Anyway, how would it help if it could be forecasted?
If it could be forecasted (for example it's always 70%(external)/30%(internal) you could adjust the scaling metric to already include the traffic that the loadbalancer metric isn't aware of.
Instead of collecting metrics from the load balancer itself which will not take into consideration the internal traffic, you can opt to use Custom Metrics (for example: queries per second).
Your application can report a custom metric to Cloud Monitoring. You can configure Kubernetes to respond to these metrics and scale your workload automatically. For example, you can scale your application based on metrics such as queries per second, writes per second, network performance, latency when communicating with a different application, or other metrics that make sense for your workload.
A custom metric can be selected for any of the following:
A particular node, Pod, or any Kubernetes object of any kind, including a CustomResourceDefinition (CRD).
The average value for a metric reported by all Pods in a Deployment
-- Cloud.google.com: Kubernetes Engine: Custom and external metrics: Custom metrics
There is an official documentation about creating Custom Metrics:
Cloud.google.com: Monitoring: Custom metics: Creating metrics
You can also look on already available metrics in the Metrics Explorer.
You can also use multiple metrics when scaling up/down with HPA:
If you configure a workload to autoscale based on multiple metrics, HPA evaluates each metric separately and uses the scaling algorithm to determine the new workload scale based on each one. The largest scale is selected for the autoscale action.
-- Cloud.google.com: Kubernetes Engine: HorizontalPodAutoscaler
As for more of a workaround solution you could also use the CPU usage metric.
Additional resources:
Cloud.google.com: Kubernetes Engine: Tutorials: Autoscaling metrics: Custom metrics
Cloud.google.com: Kubernetes Engine: How to: Horizontal pod autoscaling

Related

Kubernetes: using HPA with metrics from other pods

I have:
deployments of services A and B in k8s
Prometheus stack
I wanna scale service A when metric m1 of service B is changed.
Solutions which I found and not suitable more or less:
I can define HPA for service A with the following part of spec:
- type: Object
object:
metric:
name: m1
describedObject:
apiVersion: api/v1
kind: Pod
name: certain-pod-of-service-B
current:
value: 10k
Technically, it will work. But it's not suitable for dynamic nature of k8s.
Also I can't use pods metric (metrics: - type: Pods pods:) in HPA cause it will request m1 metric for pods of service A (which obviously doesn't have this)
Define custom metric in prometheus-adapter which query m1 metric from pods of service B. It's more suitable, but looks like workaround cause I already have a metric m1
The same for external metrics
I feel that I miss something cause it doesn't seem like a non realistic case :)
So, advise me please how to scale one service by metric of another in k8s?
I decided to provide a Community Wiki answer that may help other people facing a similar issue.
The Horizontal Pod Autoscaler is a Kubernetes feature that allows to scale applications based on one or more monitored metrics.
As we can find in the Horizontal Pod Autoscaler documentation:
The Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with custom metrics support, on some other application-provided metrics).
There are three groups of metrics that we can use with the Horizontal Pod Autoscaler:
resource metrics: predefined resource usage metrics (CPU and
memory) of pods and nodes.
custom metrics: custom metrics associated with a Kubernetes
object.
external metrics: custom metrics not associated with a
Kubernetes object.
Any HPA target can be scaled based on the resource usage of the pods (or containers) in the scaling target. The CPU utilization metric is a resource metric, you can specify other resource metrics besides CPU (e.g. memory). This seems to be the easiest and most basic method of scaling, but we can use more specific metrics by using custom metrics or external metrics.
There is one major difference between custom metrics and external metrics (see: Custom and external metrics for autoscaling workloads):
Custom metrics and external metrics differ from each other:
A custom metric is reported from your application running in Kubernetes.
An external metric is reported from an application or service not running on your cluster, but whose performance impacts your Kubernetes application.
All in all, in my opinion it is okay to use custom metrics in the case above,
I did not find any other suitable way to accomplish this task.

Expose prometheus data outside the cluster

We have components which use the Go library to write status to prometheus,
we are able to see the data in Prometheus UI,
we have components outside the K8S cluster which need to pull the data from
Prometheus , how can I expose this metrics? is there any components which I should use ?
You may want to check the Federation section of the Prometheus documents.
Federation allows a Prometheus server to scrape selected time series
from another Prometheus server. Commonly, it is used to either achieve scalable Prometheus monitoring setups or to pull related metrics from one service's Prometheus into another.
It would require to expose Prometheus service out of the cluster with Ingress or nodePort and configure the Center Prometheus to scrape metrics from the exposed service endpoint. You will have set also some proper authentication. Here`s an example of it.
Second way that comes to my mind is to use Kube-state-metrics
kube-state-metrics is a simple service that listens to the Kubernetes
API server and generates metrics about the state of the objects.
Metrics are exported on the HTTP endpoint and designed to be consumed either by Prometheus itself or by scraper that is compatible with Prometheus client endpoints. However this differ from the Metrics Server and generate metrics about the state of Kubernetes objects: node status, node capacity, number of desired replicas, pod status etc.

Horizontal Pod Autoscaling using REST API exposed by the application in container

I am using minikube on Windows, there is only one node "master".
The spring boot application deployed has REST endpoint which gives the number of client its currently serving. I would like to scale out horizontally or auto spin a pod when the requests reaches some limit.
Lets say:
There is 1 pod in the cluster.
If the request limit reached 50 (for Pod 1), spin up a new pod.
If the request limit reached 50 for Pod 1 and Pod 2, spin up a new Pod (Pod 3).
I tried researching on how to achieve this, I was not able to figure out any.
All I could find was scaling out using CPU usage with HorizontalPodAutoscaler(HPA).
Would be helpful to receive a guidance on how to achieve this using Kubernetes HPA.
I believe you can start from the autoscaling on custom metrics article. As per I see - this is achievable using the custom metrics in conjunction with Prometheus Adapter for Kubernetes Metrics APIs (An implementation of the custom.metrics.k8s.io API using Prometheus).
Prometheus Adapter for Kubernetes Metrics APIs repo contains an implementation of the Kubernetes resource metrics API and custom metrics API.
This adapter is therefore suitable for use with the autoscaling/v2 Horizontal Pod Autoscaler in Kubernetes 1.6+.
Info from autoscaling on custom metrics:
Notice that you can specify other resource metrics besides CPU. By
default, the only other supported resource metric is memory. These
resources do not change names from cluster to cluster, and should
always be available, as long as the metrics.k8s.io API is available.
The first of these alternative metric types is pod metrics. These metrics describe pods, and are averaged together across pods and compared with a target value to determine the replica count. They work much like resource metrics, except that they only support a target type of AverageValue.
Pod metrics are specified using a metric block like this:
type: Pods
pods:
metric:
name: packets-per-second
target:
type: AverageValue
averageValue: 1k
The second alternative metric type is object metrics. These metrics describe a different object in the same namespace, instead of describing pods. The metrics are not necessarily fetched from the object; they only describe it. Object metrics support target types of both Value and AverageValue. With Value, the target is compared directly to the returned metric from the API. With AverageValue, the value returned from the custom metrics API is divided by the number of pods before being compared to the target. The following example is the YAML representation of the requests-per-second metric.
type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
name: main-route
target:
type: Value
value: 2k
Also maybe below will be helpful for your future investigations:
Autoscaling on more specific metrics
Autoscaling on metrics not related to Kubernetes objects
Hope it helps

Kubernetes | Monitor HPA's Current and Target CPU Utilization using Prometheus

I want to monitor the current/target CPU utilization at the deployment/HPA level using Prometheus. GCP Kubernetes monitoring has these metrics available on Stackdriver dashboard but could not find them how they are tracking it.
Following links contains the list of HPA metrics exposed, which does not have the required/target CPU utilization.
https://github.com/kubernetes/kube-state-metrics/blob/1dfe6681e9/docs/horizontalpodautoscaler-metrics.md
I think you can take a look at cAdvisor. Actually, cAdvisor is a part of kubelet service and represents itself as a monitoring agent for performance and resource usage by the containers within particular node. By default, cAdvisor exposes container statistics across Prometheus metrics which are available in /metrics endpoint for each container. I guess you can use container_cpu_load_average_10s metric in order to fetch CPU utilization per each container for relevant Pod/Deployment.

custom metrics with HorizontalPodAutoscaler on GKE

I'm trying set up custom metrics with a HorizontalPodAutoscaler on a 1.6.1 alpha GKE cluster.
According to https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#prerequisites I need to set --horizontal-pod-autoscaler-use-rest-clients on kube-controller-manager to enable metrics collection. From GKE, it's not clear whether it's possible to set flags on kube-controller-manager. Any ideas?
Has anyone gotten custom metrics working with HPA on GKE?
You can't manipulate any of the kubernetes cluster component directly in GKE(Google Container Engine), Google will do that job, if you want to achieve that you may need to deploy your own kubernetes cluster.
On GKE we have been supporting HPA with custom metrics since version 1.9. If you have a group of horizontally autoscaled pods inside your cluster each exporting a custom metric then you can set an average per pod target for that metric.
An example of that would be an autoscaled deployment of a frontend where each replica exports its current QPS. One could set the average target of QPS per frontend pod and use the HPA to scale the deployment up and down accordingly. You can find the documentation and a tutorial explaining how to set this up here: https://cloud.google.com/kubernetes-engine/docs/tutorials/custom-metrics-autoscaling
Kubernetes 1.10 becoming available on GKE will extend the support for custom metrics to include metrics not attached to any Kubernetes object. This will give you the ability to scale a deployment based on any metric listed here, for example number of messages in Google Pub/Sub queue.