I am trying to auto-scale my redis workers based on queue size, I am collecting the metrics using redis_exporter and promethues-to-sd sidecars in my redis deployment as so:
spec:
containers:
- name: master
image: redis
env:
- name: MASTER
value: "true"
ports:
- containerPort: 6379
resources:
limits:
cpu: "100m"
requests:
cpu: "100m"
- name: redis-exporter
image: oliver006/redis_exporter:v0.21.1
env:
ports:
- containerPort: 9121
args: ["--check-keys=rq*"]
resources:
requests:
cpu: 100m
memory: 100Mi
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.9.2
command:
- /monitor
- --source=:http://localhost:9121
- --stackdriver-prefix=custom.googleapis.com
- --pod-id=$(POD_ID)
- --namespace-id=$(POD_NAMESPACE)
- --scrape-interval=15s
- --export-interval=15s
env:
- name: POD_ID
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.uid
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
resources:
requests:
cpu: 100m
memory: 100Mi
I can then view the metric (redis_key_size) in Metrics Explorer as:
metric.type="custom.googleapis.com/redis_key_size"
resource.type="gke_container"
(I CAN'T view the metric if I change resource.type=k8_pod)
However I can't seem to get the HPA to read in these metrics getting a failed to get metrics error, and can't seem to figure out the correct Object definition.
I've tried both .object.target.kind=Pod and Deployment, with deployment I get the additional error "Get namespaced metric by name for resource \"deployments\"" is not implemented.
I don't know if this issue is related to the resource.type="gke_container" and how to change that?
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Object
object:
target:
kind: <not sure>
name: <not sure>
metricName: redis_key_size
targetValue: 4
--- Update ---
This works if I use kind: Pod and manually set name to the pod name created by the deployment, however this is far from perfect.
I also tried this setup using type Pods, however the HPA says it can't read the metrics horizontal-pod-autoscaler failed to get object metric value: unable to get metric redis_key_size: no metrics returned from custom metrics API
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Pods
pods:
metricName: redis_key_size
targetAverageValue: 4
As a workaround for deployments it appears that the metrics have to be exported from pods IN the target deployment.
To get this working I had to move the prometheus-to-sd container to the deployment I wanted to scale and then scrape the exposed metrics from Redis-Exporter in the Redis deployment via the Redis service, exposing 9121 on the Redis service, and changing the CLA for the the prometheus-to-sd container such that:
- --source=:http://localhost:9121 -> - --source=:http://my-redis-service:9121
and then using the HPA
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: {{ template "webapp.backend.fullname" . }}-workers
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: {{ template "webapp.backend.fullname" . }}-workers
minReplicas: 1
maxReplicas: 4
metrics:
- type: Pods
pods:
metricName: redis_key_size
targetAverageValue: 4
Related
I have a deployment script like below, which implement a single pod Redis server. At this Reids dockerfile I have a startup.sh which basically perform some start up task if environment variable's DB_TASK value is 1
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-server
labels:
app: redis-server
spec:
replicas: 1
selector:
matchLabels:
app: redis-server
template:
metadata:
labels:
app: redis-server
spec:
containers:
- name: redis-server
image: secretregistry.azurecr.io/redis-server:__imgTag__
env:
- name: DB_TASK
value: 1
args:
- --requirepass
- __RedisSecret__
resources:
requests:
memory: "4Gi"
cpu: "1"
limits:
memory: "4Gi"
cpu: "1"
ports:
- containerPort: 6379
Now I have a HPA which basically scale up and Down this server based on CPU usage
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: redis-server-hpa
spec:
maxReplicas: 2 # define max replica count
minReplicas: 1 # define min replica count
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: redis-server
targetCPUUtilizationPercentage: 51 # target CPU utilization
Now the problem is when load goes beyond 51% it scale up 2nd Redis server with DB_TASK value is 1 how I can provision this HPA in way that I can tell it when you scale up scale up override DB_TASK value with 0 so it does not perform startup work once more.
I have a cluster where I have deployed multiple applications and I want to horizontally scale one of the deployment.
Following is my Yaml for the deployment, how can I achieve it ?
Note : I have tried changing the replicas to more than 1 and applied the new config and restarted the deployment but want to know if I need to add any policies, specs, etc to achieve the right horizontal scaling.
apiVersion: apps/v1
kind: Deployment
metadata:
name: preview
namespace: default
resourceVersion: {}
uid: {}
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: preview
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
app: preview
spec:
containers:
- image: gcr.io/{project name}/{image name}
imagePullPolicy: Always
name: preview
resources:
requests:
cpu: 10m
memory: 450Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /app/data
name: data
- mountPath: /app/conf
name: config
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: data
persistentVolumeClaim:
claimName: preview
- name: config
secret:
defaultMode: 420
secretName: preview-secrets
You can use the HPA (Horizontal Pod Autoscaler). Here is what the typical yaml configuration looks like.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa_name
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deployment_name_to_autoscale
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 80
You can monitor the scaling using kubectl get hpa
In GKE, you can achieve this with Horizontal Pod Autoscaler (HPA). The autoscaling event can be configured to be triggered by system (eg. cpu or memory) or custom metrics (eg. pubsub queued messages count). You can also set the minimum and maximum number of pods to scale up to.
Here is a link from GCP for a sample HPA yaml file
Menu > GKE > Workloads > click on your deployment > 3 dots (more
actions) > Actions > Autoscale > set metrics > Save
Here is another example for using HPA with external metric (e.g cloud pub/sub):
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: your-hpa-name
spec:
minReplicas: 1
maxReplicas: 4
metrics:
- external:
metric:
name: pubsub.googleapis.com|subscription|num_undelivered_messages
selector:
matchLabels:
resource.labels.subscription_id: your-pubsub-subscirbtion-name
target:
type: AverageValue
averageValue: 200
type: External
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: your-deployment-name
i have the fowllowing manifests:
The app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-deployment
labels:
app: hpa-nginx
spec:
replicas: 1
selector:
matchLabels:
app: hpa-nginx
template:
metadata:
labels:
app: hpa-nginx
spec:
containers:
- name: hpa-nginx
image: stacksimplify/kubenginx:1.0.0
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "500Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: hpa-demo-service-nginx
labels:
app: hpa-nginx
spec:
type: LoadBalancer
selector:
app: hpa-nginx
ports:
- port: 80
targetPort: 80
and its HPA:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-declarative
spec:
maxReplicas: 10 # define max replica count
minReplicas: 1 # define min replica count
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
targetCPUUtilizationPercentage: 20 # target CPU utilization
Notice in HPA, the target CPU is set to 20%
My question: which 20% the HPA takes ? is it requests.cpu (ie: 100m) ? or limits.cpu (ie: 200m) ? or something else ?
Thank you!
Its based off of the resources.requests.cpu.
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work
I have a kubernetes deployment file user.yaml -
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-deployment
namespace: stage
spec:
replicas: 1
selector:
matchLabels:
app: user
template:
metadata:
labels:
app: user
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9022'
spec:
nodeSelector:
env: stage
containers:
- name: user
image: <docker image path>
imagePullPolicy: Always
resources:
limits:
memory: "512Mi"
cpu: "250m"
requests:
memory: "256Mi"
cpu: "200m"
ports:
- containerPort: 8080
env:
- name: MODE
value: "local"
- name: PORT
value: ":8080"
- name: REDIS_HOST
value: "xxx"
- name: KAFKA_ENABLED
value: "true"
- name: BROKERS
value: "xxx"
imagePullSecrets:
- name: regcred
---
apiVersion: v1
kind: Service
metadata:
namespace: stage
name: user
spec:
selector:
app: user
ports:
- protocol: TCP
port: 8080
targetPort: 8080
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: user
namespace: stage
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 300Mi
This deployment is already running with linkerd injected with command cat user.yaml | linkerd inject - | kubectl apply -f -
Now I wanted to add linkerd inject annotation (as mentioned here) and use command kubectl apply -f user.yaml just like I use for a deployment without linkerd injected.
However, with modified user.yaml (after adding linkerd.io/inject annotation in deployment) -
apiVersion: apps/v1
kind: Deployment
metadata:
name: user-deployment
namespace: stage
spec:
replicas: 1
selector:
matchLabels:
app: user
template:
metadata:
labels:
app: user
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '9022'
linkerd.io/inject: enabled
spec:
nodeSelector:
env: stage
containers:
- name: user
image: <docker image path>
imagePullPolicy: Always
resources:
limits:
memory: "512Mi"
cpu: "250m"
requests:
memory: "256Mi"
cpu: "200m"
ports:
- containerPort: 8080
env:
- name: MODE
value: "local"
- name: PORT
value: ":8080"
- name: REDIS_HOST
value: "xxx"
- name: KAFKA_ENABLED
value: "true"
- name: BROKERS
value: "xxx"
imagePullSecrets:
- name: regcred
---
apiVersion: v1
kind: Service
metadata:
namespace: stage
name: user
spec:
selector:
app: user
ports:
- protocol: TCP
port: 8080
targetPort: 8080
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: user
namespace: stage
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: user-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
- type: Resource
resource:
name: memory
target:
type: AverageValue
averageValue: 300Mi
When I run kubectl apply -f user.yaml, it throws error -
service/user unchanged
horizontalpodautoscaler.autoscaling/user configured
Error from server (BadRequest): error when creating "user.yaml": Deployment in version "v1" cannot be handled as a Deployment: v1.Deployment.Spec: v1.DeploymentSpec.Template: v1.PodTemplateSpec.Spec: v1.PodSpec.Containers: []v1.Container: v1.Container.Env: []v1.EnvVar: v1.EnvVar.v1.EnvVar.Value: ReadString: expects " or n, but found 1, error found in #10 byte of ...|,"value":1},{"name":|..., bigger context ...|ue":":8080"}
Can anyone please point out where I have gone wrong in adding annotation?
Thanks
Try with double quotes like below
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "9022"
linkerd.io/inject: enabled
I've created a GKE test cluster on Google Cloud. It has 3 nodes with 2 vCPUs / 8 GB RAM. I've deployed two java apps on it
Here's the yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapi
spec:
selector:
matchLabels:
app: myapi
strategy:
type: Recreate
template:
metadata:
labels:
app: myapi
spec:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myfrontend
spec:
selector:
matchLabels:
app: myfrontend
strategy:
type: Recreate
template:
metadata:
labels:
app: myfrontend
spec:
containers:
- image: eu.gcr.io/myproject/my-frontend:latest
name: myfrontend
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myfrontend
---
Then I wanted to add a HPA with the following details:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myfrontend
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myfrontend
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 50
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80
---
If I check kubectl top pods it shows some really weird metrics:
NAME CPU(cores) MEMORY(bytes)
myapi-6fcdb94fd9-m5sh7 194m 1074Mi
myapi-6fcdb94fd9-sptbb 193m 1066Mi
myapi-6fcdb94fd9-x6kmf 200m 1108Mi
myapi-6fcdb94fd9-zzwmq 203m 1074Mi
myfrontend-788d48f456-7hxvd 0m 111Mi
myfrontend-788d48f456-hlfrn 0m 113Mi
HPA info:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapi Deployment/myapi 196%/80% 2 4 4 32m
myfrontend Deployment/myfrontend 0%/50% 2 5 2 32m
But If I check uptime on one of the nodes it shows a less lower value:
[myapi#myapi-6fcdb94fd9-sptbb /opt/]$ uptime
09:49:58 up 47 min, 0 users, load average: 0.48, 0.64, 1.23
Any idea why it shows a completely different thing. Why hpa shows 200% of current CPU utilization? And because of this it uses the maximum replicas in idle, too. Any idea?
The targetCPUUtilizationPercentage of the HPA is a percentage of the CPU requests of the containers of the target Pods. If you don't specify any CPU requests in your Pod specifications, the HPA can't do its calculations.
In your case it seems that the HPA assumes 100m as the CPU requests (or perhaps you have a LimitRange that sets the default CPU request to 100m). The current usage of your Pods is about 200m and that's why the HPA displays a utilisation of about 200%.
To set up the HPA correctly, you need to specify CPU requests for your Pods. Something like:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
resources:
requests:
cpu: 500m
Or whatever value your Pods require. If you set the targetCPUUtilizationPercentage to 80, the HPA will trigger an upscale operation at 400m usage, because 80% of 500m is 400m.
Besides that, you use an outdated version of HorizontalPodAutoscaler:
Your version: v1
Newest version: v2beta2
With the v2beta2 version, the specification looks a bit different. Something like:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
See examples.
However, the CPU utilisation mechanism described above still applies.