I've created a GKE test cluster on Google Cloud. It has 3 nodes with 2 vCPUs / 8 GB RAM. I've deployed two java apps on it
Here's the yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapi
spec:
selector:
matchLabels:
app: myapi
strategy:
type: Recreate
template:
metadata:
labels:
app: myapi
spec:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myfrontend
spec:
selector:
matchLabels:
app: myfrontend
strategy:
type: Recreate
template:
metadata:
labels:
app: myfrontend
spec:
containers:
- image: eu.gcr.io/myproject/my-frontend:latest
name: myfrontend
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myfrontend
---
Then I wanted to add a HPA with the following details:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myfrontend
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myfrontend
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 50
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80
---
If I check kubectl top pods it shows some really weird metrics:
NAME CPU(cores) MEMORY(bytes)
myapi-6fcdb94fd9-m5sh7 194m 1074Mi
myapi-6fcdb94fd9-sptbb 193m 1066Mi
myapi-6fcdb94fd9-x6kmf 200m 1108Mi
myapi-6fcdb94fd9-zzwmq 203m 1074Mi
myfrontend-788d48f456-7hxvd 0m 111Mi
myfrontend-788d48f456-hlfrn 0m 113Mi
HPA info:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapi Deployment/myapi 196%/80% 2 4 4 32m
myfrontend Deployment/myfrontend 0%/50% 2 5 2 32m
But If I check uptime on one of the nodes it shows a less lower value:
[myapi#myapi-6fcdb94fd9-sptbb /opt/]$ uptime
09:49:58 up 47 min, 0 users, load average: 0.48, 0.64, 1.23
Any idea why it shows a completely different thing. Why hpa shows 200% of current CPU utilization? And because of this it uses the maximum replicas in idle, too. Any idea?
The targetCPUUtilizationPercentage of the HPA is a percentage of the CPU requests of the containers of the target Pods. If you don't specify any CPU requests in your Pod specifications, the HPA can't do its calculations.
In your case it seems that the HPA assumes 100m as the CPU requests (or perhaps you have a LimitRange that sets the default CPU request to 100m). The current usage of your Pods is about 200m and that's why the HPA displays a utilisation of about 200%.
To set up the HPA correctly, you need to specify CPU requests for your Pods. Something like:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
resources:
requests:
cpu: 500m
Or whatever value your Pods require. If you set the targetCPUUtilizationPercentage to 80, the HPA will trigger an upscale operation at 400m usage, because 80% of 500m is 400m.
Besides that, you use an outdated version of HorizontalPodAutoscaler:
Your version: v1
Newest version: v2beta2
With the v2beta2 version, the specification looks a bit different. Something like:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
See examples.
However, the CPU utilisation mechanism described above still applies.
Related
I have a cluster where I have deployed multiple applications and I want to horizontally scale one of the deployment.
Following is my Yaml for the deployment, how can I achieve it ?
Note : I have tried changing the replicas to more than 1 and applied the new config and restarted the deployment but want to know if I need to add any policies, specs, etc to achieve the right horizontal scaling.
apiVersion: apps/v1
kind: Deployment
metadata:
name: preview
namespace: default
resourceVersion: {}
uid: {}
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: preview
strategy:
type: Recreate
template:
metadata:
creationTimestamp: null
labels:
app: preview
spec:
containers:
- image: gcr.io/{project name}/{image name}
imagePullPolicy: Always
name: preview
resources:
requests:
cpu: 10m
memory: 450Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /app/data
name: data
- mountPath: /app/conf
name: config
readOnly: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: data
persistentVolumeClaim:
claimName: preview
- name: config
secret:
defaultMode: 420
secretName: preview-secrets
You can use the HPA (Horizontal Pod Autoscaler). Here is what the typical yaml configuration looks like.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa_name
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: deployment_name_to_autoscale
minReplicas: 1
maxReplicas: 3
targetCPUUtilizationPercentage: 80
You can monitor the scaling using kubectl get hpa
In GKE, you can achieve this with Horizontal Pod Autoscaler (HPA). The autoscaling event can be configured to be triggered by system (eg. cpu or memory) or custom metrics (eg. pubsub queued messages count). You can also set the minimum and maximum number of pods to scale up to.
Here is a link from GCP for a sample HPA yaml file
Menu > GKE > Workloads > click on your deployment > 3 dots (more
actions) > Actions > Autoscale > set metrics > Save
Here is another example for using HPA with external metric (e.g cloud pub/sub):
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: your-hpa-name
spec:
minReplicas: 1
maxReplicas: 4
metrics:
- external:
metric:
name: pubsub.googleapis.com|subscription|num_undelivered_messages
selector:
matchLabels:
resource.labels.subscription_id: your-pubsub-subscirbtion-name
target:
type: AverageValue
averageValue: 200
type: External
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: your-deployment-name
i have the fowllowing manifests:
The app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-deployment
labels:
app: hpa-nginx
spec:
replicas: 1
selector:
matchLabels:
app: hpa-nginx
template:
metadata:
labels:
app: hpa-nginx
spec:
containers:
- name: hpa-nginx
image: stacksimplify/kubenginx:1.0.0
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "500Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: hpa-demo-service-nginx
labels:
app: hpa-nginx
spec:
type: LoadBalancer
selector:
app: hpa-nginx
ports:
- port: 80
targetPort: 80
and its HPA:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-declarative
spec:
maxReplicas: 10 # define max replica count
minReplicas: 1 # define min replica count
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
targetCPUUtilizationPercentage: 20 # target CPU utilization
Notice in HPA, the target CPU is set to 20%
My question: which 20% the HPA takes ? is it requests.cpu (ie: 100m) ? or limits.cpu (ie: 200m) ? or something else ?
Thank you!
Its based off of the resources.requests.cpu.
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work
I am deploying my microservice application I built using node.
Issue
The pods won't autoscale when I put load using Jmeter. The CPU utitilization goes to 50m, which doesn't invoke HPA to start autoscaling. I want it to start replicating as soon as it reaches 80% of the CPU request(which is 10m).
HPA config :
# apiVersion: autoscaling/v1
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: client-hpa
namespace: default
spec:
minReplicas: 1
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
scaleTargetRef:
# apiVersion: apps/v1beta1
apiVersion: apps/v1
kind: Deployment
name: client-depl
Deployment config :
apiVersion: apps/v1
kind: Deployment
metadata:
name: client-depl
spec:
replicas: 1
selector:
matchLabels:
app: client
template:
metadata:
labels:
app: client
spec:
containers:
- name: client
image: <docker-id>/<image-name>
resources:
requests:
memory: 350Mi
cpu: 10m ### I want it to autoscale when it reaches 8m ###
Also, kubectl get hpa shows the following output :
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
client-hpa Deployment/client-depl <unknown>/80% 1 4 1 8m32s
HPA is based on the following equtation
desiredReplicas = ceil[currentReplicas * ( currentMetricValue / desiredMetricValue )]
As you can notice, equation is based on current cpu utilization and not on CPU requests. (I want it to autoscale when it reaches 8m).
That said, perhaps the following maybe of interest to you:
Vertical Pod Autoscaling
I was curious about how Kubernetes controls replication. I my config yaml file specifies I want three pods, each with an Nginx server, for instance (from here -- https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#how-a-replicationcontroller-works)
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
How does Kubernetes know when to shut down pods and when to spin up more? For example, for high traffic loads, I'd like to spin up another pod, but I'm not sure how to configure that in the YAML file so I was wondering if Kubernetes has some behind-the-scenes magic that does that for you.
Kubernetes does no magic here - from your configuration, it does simply not know nor does it change the number of replicas.
The concept you are looking for is called an Autoscaler. It uses metrics from your cluster (need to be enabled/installed as well) and can then decide, if Pods must be scaled up or down and will in effect change the number of replicas in the deployment or replication controller. (Please use a deployment, not replication controller, the later does not support rolling updates of your applications.)
You can read more about the autoscaler here: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
You can use HorizontalPodAutoscaler along with deployment as below. This will autoscale your pod declaratively based on target CPU utilization.
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: $DEPLOY_NAME
spec:
replicas: 2
template:
metadata:
labels:
app: $DEPLOY_NAME
spec:
containers:
- name: $DEPLOY_NAME
image: $DEPLOY_IMAGE
imagePullPolicy: Always
resources:
requests:
cpu: "0.2"
memory: 256Mi
limits:
cpu: "1"
memory: 1024Mi
---
apiVersion: v1
kind: Service
metadata:
name: $DEPLOY_NAME
spec:
selector:
app: $DEPLOY_NAME
ports:
- port: 8080
type: ClusterIP
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: $DEPLOY_NAME
namespace: $K8S_NAMESPACE
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: $DEPLOY_NAME
minReplicas: 2
maxReplicas: 6
targetCPUUtilizationPercentage: 60
I have successfully followed the documentation here and here to deploy an API spec and GKE backend to Cloud Endpoints.
This has left me with a deployment.yaml that looks like this:
apiVersion: v1
kind: Service
metadata:
name: esp-myproject
spec:
ports:
- port: 80
targetPort: 8081
protocol: TCP
name: http
selector:
app: esp-myproject
type: LoadBalancer
---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: esp-myproject
spec:
replicas: 1
template:
metadata:
labels:
app: esp-myproject
spec:
containers:
- name: esp
image: gcr.io/endpoints-release/endpoints-runtime:1
args: [
"--http_port=8081",
"--backend=127.0.0.1:8080",
"--service=myproject1-0-0.endpoints.myproject.cloud.goog",
"--rollout_strategy=managed",
]
ports:
- containerPort: 8081
- name: myproject
image: gcr.io/myproject/my-image:v0.0.1
ports:
- containerPort: 8080
This creates a single replica of the app on the backend. So far, so good...
I now want to update the yaml file to declaratively specify auto-scaling parameters to enable multiple replicas of the app to run alongside each other when traffic to the endpoint justifies more than one.
I have read around (O'Reilly book: Kubernetes Up & Running, GCP docs, K8s docs), but there are two things on which I'm stumped:
I've read a number of times about the HorizontalPodAutoscaler and it's not clear to me whether the deployment must make use of this in order to enjoy the benefits of autoscaling?
If so, I have seen examples in the docs of how to define the spec for the HorizontalPodAutoscaler in yaml as shown below - but how would I combine this with my existing deployment.yaml?
HorizontalPodAutoscaler example (from the docs):
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Thanks in advance to anyone who can shed some light on this for me.
I've read a number of times about the HorizontalPodAutoscaler and it's not clear to me whether the deployment must make use of this in order to enjoy the benefits of autoscaling?
Doesn't have to, but it's recommended and it's already built in. You can build your own automation that scales up and down but the question is why since it's already supported with the HPA.
If so, I have seen examples in the docs of how to define the spec for the HorizontalPodAutoscaler in yaml as shown below - but how would I combine this with my existing deployment.yaml?
It should be straightforward. You basically reference your deployment in the HPA definition:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: my-esp-project-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: esp-myproject <== here
minReplicas: 1
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
i faced same issue what worked for me is
if you are on GKE and facing issue where enabled API are
autoscaling/v1
autoscaling/v2beta1
while GKE version is around 1.12 to 1.14 you wont be able to apply manifest of autoscaling/v2beta2 however you can apply same thing something like
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: core-deployment
namespace: default
spec:
maxReplicas: 9
minReplicas: 5
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: core-deployment
metrics:
- type: Resource
resource:
name: cpu
targetAverageValue: 500m
if you want based on utilization
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: core-deployment
namespace: default
spec:
maxReplicas: 9
minReplicas: 5
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: core-deployment
metrics:
- type: Resource
resource:
name: cpu
targetAverageUtilization: 80