Autoscaling a google Cloud-Endpoints backend deployment declaratively (in the yaml)? - kubernetes

I have successfully followed the documentation here and here to deploy an API spec and GKE backend to Cloud Endpoints.
This has left me with a deployment.yaml that looks like this:
apiVersion: v1
kind: Service
metadata:
name: esp-myproject
spec:
ports:
- port: 80
targetPort: 8081
protocol: TCP
name: http
selector:
app: esp-myproject
type: LoadBalancer
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: esp-myproject
spec:
replicas: 1
template:
metadata:
labels:
app: esp-myproject
spec:
containers:
- name: esp
image: gcr.io/endpoints-release/endpoints-runtime:1
args: [
"--http_port=8081",
"--backend=127.0.0.1:8080",
"--service=myproject1-0-0.endpoints.myproject.cloud.goog",
"--rollout_strategy=managed",
]
ports:
- containerPort: 8081
- name: myproject
image: gcr.io/myproject/my-image:v0.0.1
ports:
- containerPort: 8080
This creates a single replica of the app on the backend. So far, so good...
I now want to update the yaml file to declaratively specify auto-scaling parameters to enable multiple replicas of the app to run alongside each other when traffic to the endpoint justifies more than one.
I have read around (O'Reilly book: Kubernetes Up & Running, GCP docs, K8s docs), but there are two things on which I'm stumped:
I've read a number of times about the HorizontalPodAutoscaler and it's not clear to me whether the deployment must make use of this in order to enjoy the benefits of autoscaling?
If so, I have seen examples in the docs of how to define the spec for the HorizontalPodAutoscaler in yaml as shown below - but how would I combine this with my existing deployment.yaml?
HorizontalPodAutoscaler example (from the docs):
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Thanks in advance to anyone who can shed some light on this for me.

I've read a number of times about the HorizontalPodAutoscaler and it's not clear to me whether the deployment must make use of this in order to enjoy the benefits of autoscaling?
Doesn't have to, but it's recommended and it's already built in. You can build your own automation that scales up and down but the question is why since it's already supported with the HPA.
If so, I have seen examples in the docs of how to define the spec for the HorizontalPodAutoscaler in yaml as shown below - but how would I combine this with my existing deployment.yaml?
It should be straightforward. You basically reference your deployment in the HPA definition:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-esp-project-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: esp-myproject <== here
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50

i faced same issue what worked for me is
if you are on GKE and facing issue where enabled API are
autoscaling/v1
autoscaling/v2beta1
while GKE version is around 1.12 to 1.14 you wont be able to apply manifest of autoscaling/v2beta2 however you can apply same thing something like
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: core-deployment
namespace: default
spec:
maxReplicas: 9
minReplicas: 5
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: core-deployment
metrics:
- type: Resource
resource:
name: cpu
targetAverageValue: 500m
if you want based on utilization
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: core-deployment
namespace: default
spec:
maxReplicas: 9
minReplicas: 5
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: core-deployment
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80

Related

How to use the Kubernetes HorizontalPodAutoscaler with the memory metric?

I'm trying to understand how the Kubernetes HorizontalPodAutoscaler works.
Until now, I have used the following configuration:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: my-deployment
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-deployment
minReplicas: 1
maxReplicas: 10
targetCPUUtilizationPercentage: 50
This uses the targetCPUUtilizationPercentage parameter but I would like to use a metric for the memory percentage used, but I was not able to find any example.
Any hint?
I found also that there is this type of configuration to support multiple metrics, but the apiVersion is autoscaling/v2alpha1. Can this be used in a production environment?
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2alpha1
metadata:
name: WebFrontend
spec:
scaleTargetRef:
kind: ReplicationController
name: WebFrontend
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80
- type: Object
object:
target:
kind: Service
name: Frontend
metricName: hits-per-second
targetValue: 1k
Here is a manifest example for what you need, that includes Memory Metrics:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: web-servers
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-servers
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 20
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 30Mi
An important thing to notice is that, as you can see, it uses the autoscaling/v2beta2 API version, so you need to follow all the previous instructions listed here.
Regarding the possibility to use the autoscaling/v2alpha1, yes, you can use it, as it includes support for scaling on memory and custom metrics as this URL specifies, but keep in mind that alpha versions are released for testing, as they are not final versions.
For more autoscaling/v2beta2 YAML’s examples and a deeper look into memory metrics, you can take a look at this thread.

Not able to use the advanced behavior config in gke cluster with the latest kubernetes version as well

apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: test
spec:
behavior:
scaleDown:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 10
periodSeconds: 15
scaleUp:
stabilizationWindowSeconds: 0
policies:
#-type: Percent
#value: 100
#periodSeconds: 15
- type: Pods
value: 5
periodSeconds: 15
maxReplicas: 30
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
As per the Kubernetes official doc the HPA behavior is available for Kubernetes version v1.18 but GKE has it's own versioning. also it has api version "autoscaling/v2beta2" but the behavior is not supported.
GKE VERSION: 1.16.13-gke.1
Am I the only one to face this issue ?
Yes, you are right. GKE have it's own versioning. You can find more details here.
Note: The Kubernetes API is versioned separately from Kubernetes itself. Refer to the Kubernetes API documentation for information about Kubernetes API versioning.
Unfortunately, GKE is not supporting behavior parameter in apiVersion: autoscaling/v2beta2.
error: error validating "hpa.yaml": error validating data: ValidationError(HorizontalPodAutoscaler.spec): unknown field "behavior" in io.k8s.api.autoscaling.v2beta2.HorizontalPodAutoscalerSpec; if you choose to ignore these errors, turn validation off with --validate=false
However, it can be freely used with Kubeadm and Minikube with Kubernetes 1.18+.
There is already a Public Issue Tracker related to this issue. You can add yourself to CC in this PIT to get new updates related to this issue.
if you are on GKE and facing issue where enabled API are
autoscaling/v1
autoscaling/v2beta1
while GKE version is around 1.12 to 1.14 you wont be able to apply manifest of autoscaling/v2beta2 however you can apply same thing something like
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: core-deployment
namespace: default
spec:
maxReplicas: 9
minReplicas: 5
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: core-deployment
metrics:
- type: Resource
resource:
name: cpu
targetAverageValue: 500m
if you want based on utilization
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: core-deployment
namespace: default
spec:
maxReplicas: 9
minReplicas: 5
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: core-deployment
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80

Kubernetes HPA wrong metrics?

I've created a GKE test cluster on Google Cloud. It has 3 nodes with 2 vCPUs / 8 GB RAM. I've deployed two java apps on it
Here's the yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapi
spec:
selector:
matchLabels:
app: myapi
strategy:
type: Recreate
template:
metadata:
labels:
app: myapi
spec:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myfrontend
spec:
selector:
matchLabels:
app: myfrontend
strategy:
type: Recreate
template:
metadata:
labels:
app: myfrontend
spec:
containers:
- image: eu.gcr.io/myproject/my-frontend:latest
name: myfrontend
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myfrontend
---
Then I wanted to add a HPA with the following details:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myfrontend
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myfrontend
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 50
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80
---
If I check kubectl top pods it shows some really weird metrics:
NAME CPU(cores) MEMORY(bytes)
myapi-6fcdb94fd9-m5sh7 194m 1074Mi
myapi-6fcdb94fd9-sptbb 193m 1066Mi
myapi-6fcdb94fd9-x6kmf 200m 1108Mi
myapi-6fcdb94fd9-zzwmq 203m 1074Mi
myfrontend-788d48f456-7hxvd 0m 111Mi
myfrontend-788d48f456-hlfrn 0m 113Mi
HPA info:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapi Deployment/myapi 196%/80% 2 4 4 32m
myfrontend Deployment/myfrontend 0%/50% 2 5 2 32m
But If I check uptime on one of the nodes it shows a less lower value:
[myapi#myapi-6fcdb94fd9-sptbb /opt/]$ uptime
09:49:58 up 47 min, 0 users, load average: 0.48, 0.64, 1.23
Any idea why it shows a completely different thing. Why hpa shows 200% of current CPU utilization? And because of this it uses the maximum replicas in idle, too. Any idea?
The targetCPUUtilizationPercentage of the HPA is a percentage of the CPU requests of the containers of the target Pods. If you don't specify any CPU requests in your Pod specifications, the HPA can't do its calculations.
In your case it seems that the HPA assumes 100m as the CPU requests (or perhaps you have a LimitRange that sets the default CPU request to 100m). The current usage of your Pods is about 200m and that's why the HPA displays a utilisation of about 200%.
To set up the HPA correctly, you need to specify CPU requests for your Pods. Something like:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
resources:
requests:
cpu: 500m
Or whatever value your Pods require. If you set the targetCPUUtilizationPercentage to 80, the HPA will trigger an upscale operation at 400m usage, because 80% of 500m is 400m.
Besides that, you use an outdated version of HorizontalPodAutoscaler:
Your version: v1
Newest version: v2beta2
With the v2beta2 version, the specification looks a bit different. Something like:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
See examples.
However, the CPU utilisation mechanism described above still applies.

Kubernetes - HPA metrics - memory & cpu together

Is it possible to keep 'cpu' and 'memory' metrics together as shown below ? This seems to be not working. I tried below script as HPA. But instently pods has grown upto 5.
That's not what i was expecting.
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myservice-metrics
namespace: myschema
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myservice
minReplicas: 1
maxReplicas: 5
metrics:
- type: Resource
resource:
name: memory
targetAverageValue: 500Mi
- type: Resource
resource:
name: cpu
targetAverageUtilization: 70
If i keep it individually, it is not complaining. Is it the best practice to set both the metrics for a service ? is there any other way to set both the metrics.
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myservice-metrics-memory
namespace: myschema
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myservice
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: memory
targetAverageValue: 500Mi
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: myservice-metrics-cpu
namespace: myschema
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myservice
minReplicas: 1
maxReplicas: 3
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 70
Starting from Kubernetes v1.6 support for scaling based on multiple metrics has been added.
I would suggest to try and switch to the autoscaling/v2beta2 API.
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-multiple-metrics
https://kubernetes.io/docs/reference/kubernetes-api/workload-resources/horizontal-pod-autoscaler-v2beta2/
metrics is of type []MetricSpec and
the maximum replica count across all metrics will be used
Single file is possible.

Role of labels in istio's DestinationRule

I am going through the traffic management section of istio 's documentation.
In a DestinationRule example, it configures several service subsets.
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: my-destination-rule
spec:
host: my-svc
trafficPolicy:
loadBalancer:
simple: RANDOM
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2
trafficPolicy:
loadBalancer:
simple: ROUND_ROBIN
- name: v3
labels:
version: v3
My question (since it is not clear on the documentation) is about the role of spec.subsets.name.labels
Do these labels refer to:
labels in the corresponding k8s Deployment ?
or
labels in the pods of the Deployment?
Where exactly (in terms of k8s manifests) do the above labels reside?
Istio sticks to the labeling paradigm on Kubernetes used to identify resources within the cluster.
Since this particular DestinationRule is intended to determine, at network level, which backends are to serve requests, is targeting pods in the Deployment instead of the Deployment itself (as that is an abstract resource without any network features).
A good example of this is in the Istio sample application repository:
The Deployment doesn't have any version: v1 labels. However, the pods grouped in it does:
apiVersion: apps/v1
kind: Deployment
metadata:
name: tcp-echo
spec:
replicas: 1
selector:
matchLabels:
app: tcp-echo
version: v1
template:
metadata:
labels:
app: tcp-echo
version: v1
spec:
containers:
- name: tcp-echo
image: docker.io/istio/tcp-echo-server:1.1
imagePullPolicy: IfNotPresent
args: [ "9000", "hello" ]
ports:
- containerPort: 9000
And the DestinationRule picks these objects by their version label:
apiVersion: networking.istio.io/v1alpha3
kind: DestinationRule
metadata:
name: tcp-echo-destination
spec:
host: tcp-echo
subsets:
- name: v1
labels:
version: v1
- name: v2
labels:
version: v2