Kubernetes Spare/Cold Replica/Pod - kubernetes

I am looking for how to have a spare/cold replica/pod in my Kubernetes configuration. I assume it would go in my Kuberentes deployment or HPA configuration. Any idea how I would make it so I have 2 spare/cold instances of my app always ready, but only get put into the active pods once HPA requests another instance? My goal is to have basically zero startup time on a new pod when HPA says it needs another instance.
apiVersion: apps/v1
kind: Deployment
metadata:
name: someName
namespace: someNamespace
labels:
app: someName
version: "someVersion"
spec:
replicas: $REPLICAS
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: someMaxSurge
maxUnavailable: someMaxUnavailable
selector:
matchLabels:
app: someName
version: someVersion
template:
metadata:
labels:
app: someName
version: "someVersion"
spec:
containers:
- name: someName
image: someImage:someVersion
imagePullPolicy: Always
resources:
limits:
memory: someMemory
cpu: someCPU
requests:
memory: someMemory
cpu: someCPU
readinessProbe:
failureThreshold: someFailureThreshold
initialDelaySeconds: someInitialDelaySeconds
periodSeconds: somePeriodSeconds
timeoutSeconds: someTimeoutSeconds
livenessProbe:
httpGet:
path: somePath
port: somePort
failureThreshold: someFailureThreshold
initialDelaySeconds: someInitialDelay
periodSeconds: somePeriodSeconds
timeoutSeconds: someTimeoutSeocnds
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: someName
namespace: someNamespace
spec:
minAvailable: someMinAvailable
selector:
matchLabels:
app: someName
version: "someVersion"
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: someName-hpa
namespace: someNamespace
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: someName
minReplicas: someMinReplicas
maxReplicas: someMaxReplicas
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: someAverageUtilization

I am just wanting to always have 2 spare for scaling, or if one becomes unavailable or any reason
It is a good practice to have at least two replicas for services on Kubernetes. This helps if e.g. a node goes down or you need to do maintenance of the node. Also set Pod Topology Spread Constraints so that those pods are scheduled to run on different nodes.
Set the number of replicas that you minimum want as desired state. In Kubernetes, traffic will be load balanced to the replicas. Also use Horizontal Pod Autoscaler to define when you want to autoscale to more replicas. You can set the requirements low for autoscaling, if you want to scale up early.

Related

GKE "no.scale.down.node.pod.not.enough.pdb" log even with existing PDB

My GKE cluster is displaying "Scale down blocked by pod" note, and clicking it then going to the Logs Explorer it shows a filtered view with log entries for the pods that had the incident: no.scale.down.node.pod.not.enough.pdb . But that's really strange since the pods on the log entries having that message do have PDB defined for them. So it seems to me that GKE is wrongly reporting the cause of the blocking of the node scale down. These are the manifests for one of the pods with this issue:
apiVersion: v1
kind: Service
metadata:
labels:
app: ms-new-api-beta
name: ms-new-api-beta
namespace: beta
spec:
ports:
- port: 8000
protocol: TCP
targetPort: 8000
selector:
app: ms-new-api-beta
type: NodePort
The Deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: ms-new-api-beta
name: ms-new-api-beta
namespace: beta
spec:
selector:
matchLabels:
app: ms-new-api-beta
template:
metadata:
annotations:
cluster-autoscaler.kubernetes.io/safe-to-evict: 'true'
labels:
app: ms-new-api-beta
spec:
containers:
- command:
- /deploy/venv/bin/gunicorn
- '--bind'
- '0.0.0.0:8000'
- 'newapi.app:app'
- '--chdir'
- /deploy/app
- '--timeout'
- '7200'
- '--workers'
- '1'
- '--worker-class'
- uvicorn.workers.UvicornWorker
- '--log-level'
- DEBUG
env:
- name: ENV
value: BETA
image: >-
gcr.io/.../api:${trigger['tag']}
imagePullPolicy: Always
livenessProbe:
failureThreshold: 5
httpGet:
path: /rest
port: 8000
scheme: HTTP
initialDelaySeconds: 120
periodSeconds: 20
timeoutSeconds: 30
name: ms-new-api-beta
ports:
- containerPort: 8000
name: http
protocol: TCP
readinessProbe:
httpGet:
path: /rest
port: 8000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 2
resources:
limits:
cpu: 150m
requests:
cpu: 100m
startupProbe:
failureThreshold: 30
httpGet:
path: /rest
port: 8000
periodSeconds: 120
imagePullSecrets:
- name: gcp-docker-registry
The Horizontal Pod Autoscaler:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: ms-new-api-beta
namespace: beta
spec:
maxReplicas: 5
minReplicas: 2
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: ms-new-api-beta
targetCPUUtilizationPercentage: 100
And finally, the Pod Disruption Budget:
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: ms-new-api-beta
namespace: beta
spec:
minAvailable: 0
selector:
matchLabels:
app: ms-new-api-beta
no.scale.down.node.pod.not.enough.pdb is not complaining about the lack of a PDB. It is complaining that, if the pod is scaled down, it will be in violation of the existing PDB(s).
The "budget" is how much disruption the Pod can permit. The platform will not take any intentional action which violates that budget.
There may be another PDB in place that would be violated. To check, make sure to review pdbs in the pod's namespace:
kubectl get pdb

Horizontal Pod Autoscaler: which the exact value does HPA take?

i have the fowllowing manifests:
The app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-deployment
labels:
app: hpa-nginx
spec:
replicas: 1
selector:
matchLabels:
app: hpa-nginx
template:
metadata:
labels:
app: hpa-nginx
spec:
containers:
- name: hpa-nginx
image: stacksimplify/kubenginx:1.0.0
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "500Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: hpa-demo-service-nginx
labels:
app: hpa-nginx
spec:
type: LoadBalancer
selector:
app: hpa-nginx
ports:
- port: 80
targetPort: 80
and its HPA:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-declarative
spec:
maxReplicas: 10 # define max replica count
minReplicas: 1 # define min replica count
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
targetCPUUtilizationPercentage: 20 # target CPU utilization
Notice in HPA, the target CPU is set to 20%
My question: which 20% the HPA takes ? is it requests.cpu (ie: 100m) ? or limits.cpu (ie: 200m) ? or something else ?
Thank you!
Its based off of the resources.requests.cpu.
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

GKE autoscaler overwrites my HorizontalPodAutoscaler in infinite loop

I own a GKE Cluster on GCP, I have 1 node pool with 1 node (4 CPU/16Gb RAM).
Today I tried to scale one of my application to 10 replicas (We want to run lots of concurrent requests on it).
I first edited my horizontalPodAutoscaler.yaml and changed maxReplicas from 5 to 50 and minReplicas from 1 to 10.
Then I edited deployment.yaml and modified spec.replicas from 3 to 10.
Now my deployment is stuck in a loop: It tries to deploy the 10 pods, and as soon as the 10 are ready, it kills 5 of them to go back to 5, in an infinite loop.
Here a the screenshots of the state of the Autoscaler during the loop, it's like it tries to apply 1 configuration and immeditalety the configuration get overwritten by the other.
Here are the config files I am using:
horizontalPodScheduler.yaml
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
labels:
app: my-app
env: production
name: my-app-hpa
namespace: production
spec:
maxReplicas: 50
metrics:
- resource:
name: cpu
targetAverageUtilization: 80
type: Resource
minReplicas: 10
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: my-app
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: my-app
env: production
name: my-app
namespace: production
spec:
replicas: 10
selector:
matchLabels:
app: my-app
env: production
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
app: my-app
env: production
spec:
nodeSelector:
cloud.google.com/gke-nodepool: my-pool
containers:
- image: gcr.io/my_project_id/github.com/my_org/my-app
imagePullPolicy: IfNotPresent
name: my-app-1
resources:
requests:
cpu: "50m"

Kubernetes HPA wrong metrics?

I've created a GKE test cluster on Google Cloud. It has 3 nodes with 2 vCPUs / 8 GB RAM. I've deployed two java apps on it
Here's the yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapi
spec:
selector:
matchLabels:
app: myapi
strategy:
type: Recreate
template:
metadata:
labels:
app: myapi
spec:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myfrontend
spec:
selector:
matchLabels:
app: myfrontend
strategy:
type: Recreate
template:
metadata:
labels:
app: myfrontend
spec:
containers:
- image: eu.gcr.io/myproject/my-frontend:latest
name: myfrontend
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myfrontend
---
Then I wanted to add a HPA with the following details:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myfrontend
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myfrontend
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 50
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80
---
If I check kubectl top pods it shows some really weird metrics:
NAME CPU(cores) MEMORY(bytes)
myapi-6fcdb94fd9-m5sh7 194m 1074Mi
myapi-6fcdb94fd9-sptbb 193m 1066Mi
myapi-6fcdb94fd9-x6kmf 200m 1108Mi
myapi-6fcdb94fd9-zzwmq 203m 1074Mi
myfrontend-788d48f456-7hxvd 0m 111Mi
myfrontend-788d48f456-hlfrn 0m 113Mi
HPA info:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapi Deployment/myapi 196%/80% 2 4 4 32m
myfrontend Deployment/myfrontend 0%/50% 2 5 2 32m
But If I check uptime on one of the nodes it shows a less lower value:
[myapi#myapi-6fcdb94fd9-sptbb /opt/]$ uptime
09:49:58 up 47 min, 0 users, load average: 0.48, 0.64, 1.23
Any idea why it shows a completely different thing. Why hpa shows 200% of current CPU utilization? And because of this it uses the maximum replicas in idle, too. Any idea?
The targetCPUUtilizationPercentage of the HPA is a percentage of the CPU requests of the containers of the target Pods. If you don't specify any CPU requests in your Pod specifications, the HPA can't do its calculations.
In your case it seems that the HPA assumes 100m as the CPU requests (or perhaps you have a LimitRange that sets the default CPU request to 100m). The current usage of your Pods is about 200m and that's why the HPA displays a utilisation of about 200%.
To set up the HPA correctly, you need to specify CPU requests for your Pods. Something like:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
resources:
requests:
cpu: 500m
Or whatever value your Pods require. If you set the targetCPUUtilizationPercentage to 80, the HPA will trigger an upscale operation at 400m usage, because 80% of 500m is 400m.
Besides that, you use an outdated version of HorizontalPodAutoscaler:
Your version: v1
Newest version: v2beta2
With the v2beta2 version, the specification looks a bit different. Something like:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
See examples.
However, the CPU utilisation mechanism described above still applies.

How does Kubernetes control replication?

I was curious about how Kubernetes controls replication. I my config yaml file specifies I want three pods, each with an Nginx server, for instance (from here -- https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#how-a-replicationcontroller-works)
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
How does Kubernetes know when to shut down pods and when to spin up more? For example, for high traffic loads, I'd like to spin up another pod, but I'm not sure how to configure that in the YAML file so I was wondering if Kubernetes has some behind-the-scenes magic that does that for you.
Kubernetes does no magic here - from your configuration, it does simply not know nor does it change the number of replicas.
The concept you are looking for is called an Autoscaler. It uses metrics from your cluster (need to be enabled/installed as well) and can then decide, if Pods must be scaled up or down and will in effect change the number of replicas in the deployment or replication controller. (Please use a deployment, not replication controller, the later does not support rolling updates of your applications.)
You can read more about the autoscaler here: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
You can use HorizontalPodAutoscaler along with deployment as below. This will autoscale your pod declaratively based on target CPU utilization.
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: $DEPLOY_NAME
spec:
replicas: 2
template:
metadata:
labels:
app: $DEPLOY_NAME
spec:
containers:
- name: $DEPLOY_NAME
image: $DEPLOY_IMAGE
imagePullPolicy: Always
resources:
requests:
cpu: "0.2"
memory: 256Mi
limits:
cpu: "1"
memory: 1024Mi
---
apiVersion: v1
kind: Service
metadata:
name: $DEPLOY_NAME
spec:
selector:
app: $DEPLOY_NAME
ports:
- port: 8080
type: ClusterIP
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: $DEPLOY_NAME
namespace: $K8S_NAMESPACE
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: $DEPLOY_NAME
minReplicas: 2
maxReplicas: 6
targetCPUUtilizationPercentage: 60