How does Kubernetes control replication? - kubernetes

I was curious about how Kubernetes controls replication. I my config yaml file specifies I want three pods, each with an Nginx server, for instance (from here -- https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#how-a-replicationcontroller-works)
apiVersion: v1
kind: ReplicationController
metadata:
name: nginx
spec:
replicas: 3
selector:
app: nginx
template:
metadata:
name: nginx
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx
ports:
- containerPort: 80
How does Kubernetes know when to shut down pods and when to spin up more? For example, for high traffic loads, I'd like to spin up another pod, but I'm not sure how to configure that in the YAML file so I was wondering if Kubernetes has some behind-the-scenes magic that does that for you.

Kubernetes does no magic here - from your configuration, it does simply not know nor does it change the number of replicas.
The concept you are looking for is called an Autoscaler. It uses metrics from your cluster (need to be enabled/installed as well) and can then decide, if Pods must be scaled up or down and will in effect change the number of replicas in the deployment or replication controller. (Please use a deployment, not replication controller, the later does not support rolling updates of your applications.)
You can read more about the autoscaler here: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/

You can use HorizontalPodAutoscaler along with deployment as below. This will autoscale your pod declaratively based on target CPU utilization.
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: $DEPLOY_NAME
spec:
replicas: 2
template:
metadata:
labels:
app: $DEPLOY_NAME
spec:
containers:
- name: $DEPLOY_NAME
image: $DEPLOY_IMAGE
imagePullPolicy: Always
resources:
requests:
cpu: "0.2"
memory: 256Mi
limits:
cpu: "1"
memory: 1024Mi
---
apiVersion: v1
kind: Service
metadata:
name: $DEPLOY_NAME
spec:
selector:
app: $DEPLOY_NAME
ports:
- port: 8080
type: ClusterIP
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: $DEPLOY_NAME
namespace: $K8S_NAMESPACE
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: $DEPLOY_NAME
minReplicas: 2
maxReplicas: 6
targetCPUUtilizationPercentage: 60

Related

Horizontal Pod Autoscaler: which the exact value does HPA take?

i have the fowllowing manifests:
The app:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hpa-demo-deployment
labels:
app: hpa-nginx
spec:
replicas: 1
selector:
matchLabels:
app: hpa-nginx
template:
metadata:
labels:
app: hpa-nginx
spec:
containers:
- name: hpa-nginx
image: stacksimplify/kubenginx:1.0.0
ports:
- containerPort: 80
resources:
requests:
memory: "128Mi"
cpu: "100m"
limits:
memory: "500Mi"
cpu: "200m"
---
apiVersion: v1
kind: Service
metadata:
name: hpa-demo-service-nginx
labels:
app: hpa-nginx
spec:
type: LoadBalancer
selector:
app: hpa-nginx
ports:
- port: 80
targetPort: 80
and its HPA:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-demo-declarative
spec:
maxReplicas: 10 # define max replica count
minReplicas: 1 # define min replica count
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: hpa-demo-deployment
targetCPUUtilizationPercentage: 20 # target CPU utilization
Notice in HPA, the target CPU is set to 20%
My question: which 20% the HPA takes ? is it requests.cpu (ie: 100m) ? or limits.cpu (ie: 200m) ? or something else ?
Thank you!
Its based off of the resources.requests.cpu.
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod
https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#how-does-a-horizontalpodautoscaler-work

StatefulSet: Longer rolling update lead Version mismatching

Application is deployed on K8s using StatefulSet because of stateful in nature. There is around 250+ pods are running and HPA has been implemented on it too that can scale upto 400 pods.
When new deployment occurs, it takes longer time (~ 10-15m) to update all pods in Rolling Update fashion.
Problem: End user get response from 2 version of pods until all pods are replaced with new revision.
I am googling for an architecture where overall deployment time can be reduced and getting the best possible solutions to use BLUE/GREEN strategy but it has bunch of impact with integrated services like monitoring, logging, telemetry etc because of 2 naming conventions.
Ideally I am looking for a solutions like maxSurge for Deployment in which firstly new pods are created and then traffic are shifted to it at a time but in case of StatefulSet, it won't support maxSurge with RollingUpdate strategy & controller will delete and recreate each Pod in the StatefulSet based on ordinal index from bigger to smaller.
The solution is to do a partitioning rolling update along with a canary deployment.
Let’s suppose we have the statefulset workload defined by the following yaml file:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
version: "1.20"
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
version: "1.20"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # Label selector that determines which Pods belong to the StatefulSet
# Must match spec: template: metadata: labels
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx # Pod template's label selector
version: "1.20"
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx:1.20
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
You could patch the statefulset to create a partition, and change the image and version label for the remaining pods: (In this case, since there are only 3 pods, the last one will be the one that will change its image.)
$ kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":2}}}}'
$ kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/spec/containers/0/image", "value":"nginx:1.21"}]'
$ kubectl patch statefulset web --type='json' -p='[{"op": "replace", "path": "/spec/template/metadata/labels/version", "value":"1.21"}]'
At this point, you have a pod with the new image and version label ready to use, but since the version label is different, the traffic is still going to the other two pods. If you change the version in the yaml file and apply the new configuration, the rollout will be transparent, since there is already a pod ready to migrate the traffic:
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
version: "1.21"
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
version: "1.21"
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
spec:
selector:
matchLabels:
app: nginx # Label selector that determines which Pods belong to the StatefulSet
# Must match spec: template: metadata: labels
serviceName: "nginx"
replicas: 3
template:
metadata:
labels:
app: nginx # Pod template's label selector
version: "1.21"
spec:
terminationGracePeriodSeconds: 10
containers:
- name: nginx
image: nginx:1.21
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
$ kubectl apply -f file-name.yaml
Once traffic is migrated to the pod containing the new image and version label, you should patch again the statefulset and remove the partition with the command kubectl patch statefulset web -p '{"spec":{"updateStrategy":{"type":"RollingUpdate","rollingUpdate":{"partition":0}}}}'
Note: You will need to be very careful with the size of the partition, since the remaining pods will handle the whole traffic for some time.

Kubernetes HPA wrong metrics?

I've created a GKE test cluster on Google Cloud. It has 3 nodes with 2 vCPUs / 8 GB RAM. I've deployed two java apps on it
Here's the yaml file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapi
spec:
selector:
matchLabels:
app: myapi
strategy:
type: Recreate
template:
metadata:
labels:
app: myapi
spec:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: myfrontend
spec:
selector:
matchLabels:
app: myfrontend
strategy:
type: Recreate
template:
metadata:
labels:
app: myfrontend
spec:
containers:
- image: eu.gcr.io/myproject/my-frontend:latest
name: myfrontend
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myfrontend
---
Then I wanted to add a HPA with the following details:
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myfrontend
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myfrontend
minReplicas: 2
maxReplicas: 5
targetCPUUtilizationPercentage: 50
---
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
targetCPUUtilizationPercentage: 80
---
If I check kubectl top pods it shows some really weird metrics:
NAME CPU(cores) MEMORY(bytes)
myapi-6fcdb94fd9-m5sh7 194m 1074Mi
myapi-6fcdb94fd9-sptbb 193m 1066Mi
myapi-6fcdb94fd9-x6kmf 200m 1108Mi
myapi-6fcdb94fd9-zzwmq 203m 1074Mi
myfrontend-788d48f456-7hxvd 0m 111Mi
myfrontend-788d48f456-hlfrn 0m 113Mi
HPA info:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
myapi Deployment/myapi 196%/80% 2 4 4 32m
myfrontend Deployment/myfrontend 0%/50% 2 5 2 32m
But If I check uptime on one of the nodes it shows a less lower value:
[myapi#myapi-6fcdb94fd9-sptbb /opt/]$ uptime
09:49:58 up 47 min, 0 users, load average: 0.48, 0.64, 1.23
Any idea why it shows a completely different thing. Why hpa shows 200% of current CPU utilization? And because of this it uses the maximum replicas in idle, too. Any idea?
The targetCPUUtilizationPercentage of the HPA is a percentage of the CPU requests of the containers of the target Pods. If you don't specify any CPU requests in your Pod specifications, the HPA can't do its calculations.
In your case it seems that the HPA assumes 100m as the CPU requests (or perhaps you have a LimitRange that sets the default CPU request to 100m). The current usage of your Pods is about 200m and that's why the HPA displays a utilisation of about 200%.
To set up the HPA correctly, you need to specify CPU requests for your Pods. Something like:
containers:
- image: eu.gcr.io/myproject/my-api:latest
name: myapi
imagePullPolicy: Always
ports:
- containerPort: 8080
name: myapi
resources:
requests:
cpu: 500m
Or whatever value your Pods require. If you set the targetCPUUtilizationPercentage to 80, the HPA will trigger an upscale operation at 400m usage, because 80% of 500m is 400m.
Besides that, you use an outdated version of HorizontalPodAutoscaler:
Your version: v1
Newest version: v2beta2
With the v2beta2 version, the specification looks a bit different. Something like:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: myapi
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapi
minReplicas: 2
maxReplicas: 4
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 80
See examples.
However, the CPU utilisation mechanism described above still applies.

Kubernetes : Deploy only in one node-pool

I'm currently creating a Kubernetes cluster for a production environment.
In my cluster, I have 2 node-pool, let's call them api-pool and web-pool
In my api-pool, I have 2 nodes with 4CPU and 15Gb of RAM each.
I'm trying to deploy 8 replicas of my api in my api-pool, each replicas should have 1CPU and 3.5Gi of RAM.
My api.deployment.yaml looks something like this :
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-dev
spec:
replicas: 8
selector:
matchLabels:
app: my-api
template:
metadata:
labels:
app: my-api
spec:
containers:
- name: api-docker
image: //MY_IMAGE
imagePullPolicy: Always
envFrom:
- configMapRef:
name: api-dev-env
- secretRef:
name: api-dev-secret
ports:
- containerPort: 80
resources:
requests:
cpu: "1"
memory: "3.5Gi"
But my problem is that Kubernetes is deploying the pods on nodes on my web-pool as well as in my api-pool but I want those pods to be deployed only in my api-pool.
I tried to label my nodes of the api-pool to use a selector that matches labels but it doesn't work and I'm not sure it's supposed to work that way.
How can I precise to K8s to deploy those 8 replicas only in my api-pool ?
You can use a nodeselector which is the simplest recommended form of node selection constraint.
label the nodes of api-pool with pool=api
kubectl label nodes nodename pool=api
Add nodeSelector in pod spec.
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: api-dev
spec:
replicas: 8
selector:
matchLabels:
app: my-api
template:
metadata:
labels:
app: my-api
spec:
containers:
- name: api-docker
image: //MY_IMAGE
imagePullPolicy: Always
envFrom:
- configMapRef:
name: api-dev-env
- secretRef:
name: api-dev-secret
ports:
- containerPort: 80
resources:
requests:
cpu: "1"
memory: "3.5Gi"
nodeSelector:
pool: api
For mode advanced use cases you can use node affinity.

ActiveMQ on Kuberenetes with Shared storage

I have existing applications built with Apache Camel and ActiveMQ. As part of migration to Kubernetes, what we are doing is moving the same services developed with Apache Camel to Kubernetes. I need to deploy ActiveMQ such that I do not lose the data in case one of the Pod dies.
What I am doing now is running a deployment with RelicaSet value to 2. This will start 2 pods and with a Service in front, I can serve any request while atleast 1 Pod is up. However, if one Pod dies, i do not want to lose the data. I want to implement something like a shared file system between the Pods. My environment is in AWS so I can use EBS. Can you suggest, how to achieve that.
Below is my deployment and service YAML.
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: smp-activemq
spec:
replicas: 1
template:
metadata:
labels:
app: smp-activemq
spec:
containers:
- name: smp-activemq
image: dasdebde/activemq:5.15.9
imagePullPolicy: IfNotPresent
ports:
- containerPort: 61616
resources:
limits:
memory: 512Mi
---
apiVersion: v1
kind: Service
metadata:
name: smp-activemq
spec:
type: NodePort
selector:
app: smp-activemq
ports:
- nodePort: 32191
port: 61616
targetPort: 61616
StatefulSets are valuable for applications that require stable, persistent storage. Deleting and/or scaling a StatefulSet down will not delete the volumes associated with the StatefulSet. This is done to ensure data safety. The "volumeClaimTemplates" part in yaml will provide stable storage using PersistentVolumes provisioned by a PersistentVolume Provisioner.
In your case, StatefulSet file definition will look similar to this:
apiVersion: v1
kind: Service
metadata:
name: smp-activemq
labels:
app: smp-activemq
spec:
type: NodePort
selector:
app: smp-activemq
ports:
- nodePort: 32191
port: 61616
name: smp-activemq
targetPort: 61616
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: smp-activemq
spec:
selector:
matchLabels:
app: smp-activemq
serviceName: smp-activemq
replicas: 1
template:
metadata:
labels:
app: smp-activemq
spec:
containers:
- name: smp-activemq
image: dasdebde/activemq:5.15.9
imagePullPolicy: IfNotPresent
ports:
- containerPort: 61616
name: smp-activemq
volumeMounts:
- name: www
mountPath: <mount-path>
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "<storageclass-name>"
resources:
requests:
storage: 1Gi
That what you need to define is your StorageClass name and mountPath. I hope it will helps you.
In high-level terms, what you want is a StatefulSet instead of a Deployment for your ActiveMQ. You are correct that you want "shared file system" -- in kubernetes this is expressed as a "Persistent Volume", which is made available to the pods in your StatefulSet using a "Volume Mount".
These are the things you need to look up.