Resource allocation to container in Kubernetes pods

Resource allocation to container in Kubernetes pods - kubernetes

Consider the below .yaml file :
application/guestbook/redis-slave-deployment.yaml
apiVersion: apps/v1 # for versions before 1.9.0 use apps/v1beta2
kind: Deployment
metadata:
name: redis-slave
labels:
app: redis
spec:
selector:
matchLabels:
app: redis
role: slave
tier: backend
replicas: 2
template:
metadata:
labels:
app: redis
role: slave
tier: backend
spec:
containers:
- name: slave
image: gcr.io/google_samples/gb-redisslave:v1
resources:
requests:
cpu: 100m
memory: 100Mi
env:
- name: GET_HOSTS_FROM
value: dns
ports:
- containerPort: 6379
The resource section isn't clear to me! If I have 16G RAM and 4core CPU, each core 2GHz, then how much are the requested resources which come above?

So you have a total of 4 CPU cores and 16GB RAM. This Deployment will start two Pods (replicas) and each will start with 0.1 cores and 0.1GB reserved on the Node on which it starts. So in total 0.2 cores and 0.2GB will be reserved, leaving up to 15.8GB and 3.8cores. However the actual usage may exceed the reservation as this is only a the requested amount. To specify an upper limit you use a limits section.
It can be counter-intuitive that CPU allocation is based on cores rather than GHz - there's a fuller explanation in the GCP docs and more on the arithmetic in the official kubernetes docs

Related

Ephemeral volume limit making pod errored out

I'm working on the task where I want to limit the Ephemeral volume to a certain Gi.
This is my deployment configuration file.
`
apiVersion: apps/v1
kind: Deployment
metadata:
name: vol
namespace: namespace1
labels:
app: vol
spec:
replicas: 1
selector:
matchLabels:
app: vol
template:
metadata:
labels:
app: vol
spec:
containers:
- name: vol
image: <my-image>
ports:
- containerPort: 5000
resources:
limits:
ephemeral-storage: "1Gi"
volumeMounts:
- name: ephemeral
mountPath: "/volume"
volumes:
- name: ephemeral
emptyDir: {}
`
The expected behaviour is when the volume limit is met it should evict the pod, which is happening as expected.
The only problem I have is after the default termination grace period the pod is getting into an error state with a warning ExceededGracePeriod. Now, I have one pod running and one error in my deployment.
I have tried solutions like increased the terminationGracePeriodSeconds, tried to use preStop hook, settings limit on emptyDir: {} as well but nothing worked out for me.

You can increase the ephemeral storage limit to 2Gi ephemeral storage. This might resolve your error. Refer to this doc for quotas and limit ranges of ephemeral storage. In kubernetes documentation you can find more details how Ephemeral storage consumption management works here.

Why the number of pods running is reduced while the number of messages in the queue is still high?

I am running a KEDA enabled k8s cluster with Storage Queue Azure Function which executes 2000 messages at once having variable execution times (between 30 seconds to 10 minutes).
The system works fine for a certain duration but when the number of messages in the queue starts decreasing, the number of pods also start decreasing.
Question: Why the number of pods running is reduced while the number of messages in the queue is still high?
Example: The maximum replica count is 120 and the number of messages in the queue is around 250. So why the HPA scales down the pods. Ideally, it should still be using all the 120 pods to complete all messages in the queue.
Supplement Questions:
Does queueLength affects the behavior of HPA?
Few messages were not executed fully. It seems they were shut down abruptly during the execution though terminationGracePeriodSeconds is 10 minutes. Any comments?
Following is the scaledobject.yaml file:
apiVersion: v1
kind: Secret
metadata:
name: queue-connection-secret
data:
connection-string: ####
---
apiVersion: keda.sh/v1alpha1
kind: TriggerAuthentication
metadata:
name: azure-queue-auth
spec:
secretTargetRef:
- parameter: connection
name: queue-connection-secret
key: connection-string
---
apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
name: queuetrigfuncscaledobject
spec:
scaleTargetRef:
name: queuetrigfuncdeployment
minReplicaCount: 0
maxReplicaCount: 120
pollingInterval: 1
cooldownPeriod: 900
triggers:
- type: azure-queue
metadata:
queueName: k8s-poc-queue
queueLength: "1"
authenticationRef:
name: azure-queue-auth
Deployments.yaml file:
apiVersion : apps/v1
kind: Deployment
metadata:
name: queuetrigfuncdeployment
labels:
app: queuetrigfuncpod
spec:
selector:
matchLabels:
app: queuetrigfuncpod
template:
metadata:
labels:
app: queuetrigfuncpod
spec:
containers:
- image: #######.azurecr.io/#######
name: queuetrigcontainer
ports:
- containerPort: 80
resources:
requests:
memory: "500Mi"
cpu: "700m"
limits:
memory: "600Mi"
cpu: "700m"
nodeSelector:
agentpool: testuserpool
terminationGracePeriodSeconds: 600
imagePullSecrets:
- name: regcred

That scaler is using AverageValue mode so the count is getting divided by the number of pods. You want Value mode instead, but there is no option for that currently.
This is an overall issue in Keda that we keep meaning to fix globally but no one has had the time. If you put a PR, ping me in Slack and I can review it.

Why memory usage is greater than what I set in Kubernetes's node?

I allocated resource to 1 pod only with 650MB/30% of memory (with other built-in pods, limit memory is 69% only)
However, when the pod handling process, the usage of pod is within 650MB but overall usage of node is 94%.
Why does it happen because it supposed to have upper limit of 69%? Is it due to other built-in pods which did not set limit? How to prevent this as sometimes my pod with error if usage of Memory > 100%?
My allocation setting (kubectl describe nodes):
Memory usage of Kubernetes Node and Pod when idle:
kubectl top nodes
kubectl top pods
Memory usage of Kubernetes Node and Pod when running task:
kubectl top nodes
kubectl top pods
Further Tested behaviour:
1. Prepare deployment, pods and service under namespace test-ns
2. Since only kube-system and test-ns have pods, so assign 1000Mi to each of them (from kubectl describe nodes) aimed to less than 2GB
3. Suppose memory used in kube-system and test-ns will be less than 2GB which is less than 100%, why memory usage can be 106%?
In .yaml file:
apiVersion: v1
kind: LimitRange
metadata:
name: default-mem-limit
namespace: test-ns
spec:
limits:
- default:
memory: 1000Mi
type: Container
---
apiVersion: v1
kind: LimitRange
metadata:
name: default-mem-limit
namespace: kube-system
spec:
limits:
- default:
memory: 1000Mi
type: Container
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: devops-deployment
namespace: test-ns
labels:
app: devops-pdf
spec:
selector:
matchLabels:
app: devops-pdf
replicas: 2
template:
metadata:
labels:
app: devops-pdf
spec:
containers:
- name: devops-pdf
image: dev.azurecr.io/devops-pdf:latest
imagePullPolicy: Always
ports:
- containerPort: 3000
resources:
requests:
cpu: 600m
memory: 500Mi
limits:
cpu: 600m
memory: 500Mi
imagePullSecrets:
- name: regcred
---
apiVersion: v1
kind: Service
metadata:
name: devops-pdf
namespace: test-ns
spec:
type: LoadBalancer
ports:
- port: 8007
selector:
app: devops-pdf

This effect is most likely caused by the 4 Pods that run on that node without a memory limit specified, shown as 0 (0%). Of course 0 doesn't mean it can't use even a single byte of memory as no program can be started without using memory; instead it means that there is no limit, it can use as much as available. Also programs running not in pod (ssh, cron, ...) are included in the total used figure, but are not limited by kubernetes (by cgroups).
Now kubernetes sets up the kernel oom adjustment values in a tricky way to favour containers that are under their memory request, making it more more likely to kill processes in containers that are between their memory request and limit, and making it most likely to kill processes in containers with no memory limits. However, this is only shown to work fairly in the long run, and sometimes the kernel can kill your favourite process in your favourite container that is behaving well (using less than its memory request). See https://kubernetes.io/docs/tasks/administer-cluster/out-of-resource/#node-oom-behavior
The pods without memory limit in this particular case are coming from the aks system itself, so setting their memory limit in the pod templates is not an option as there is a reconciler that will restore it (eventually). To remedy the situation I suggest that you create a LimitRange object in the kube-system namespace that will assign a memory limit to all pods without a limit (as they are created):
apiVersion: v1
kind: LimitRange
metadata:
name: default-mem-limit
namespace: kube-system
spec:
limits:
- default:
memory: 150Mi
type: Container
(You will need to delete the already existing Pods without a memory limit for this to take effect; they will get recreated)
This is not going to completely eliminate the problem as you might end up with an overcommitted node; however the memory usage will make sense and the oom events will be more predictable.

GCloud kubernetes cluster with 1 Insufficient cpu error

I created a Kubernetes cluster on Google Cloud using:
gcloud container clusters create my-app-cluster --num-nodes=1
Then I deployed my 3 apps (backend, frontend and a scraper) and created a load balancer. I used the following configuration file:
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app-deployment
labels:
app: my-app
spec:
replicas: 1
selector:
matchLabels:
app: my-app
template:
metadata:
labels:
app: my-app
spec:
containers:
- name: my-app-server
image: gcr.io/my-app/server
ports:
- containerPort: 8009
envFrom:
- secretRef:
name: my-app-production-secrets
- name: my-app-scraper
image: gcr.io/my-app/scraper
ports:
- containerPort: 8109
envFrom:
- secretRef:
name: my-app-production-secrets
- name: my-app-frontend
image: gcr.io/my-app/frontend
ports:
- containerPort: 80
envFrom:
- secretRef:
name: my-app-production-secrets
---
apiVersion: v1
kind: Service
metadata:
name: my-app-lb-service
spec:
type: LoadBalancer
selector:
app: my-app
ports:
- name: my-app-server-port
protocol: TCP
port: 8009
targetPort: 8009
- name: my-app-scraper-port
protocol: TCP
port: 8109
targetPort: 8109
- name: my-app-frontend-port
protocol: TCP
port: 80
targetPort: 80
When typing kubectl get pods I get:
NAME READY STATUS RESTARTS AGE
my-app-deployment-6b49c9b5c4-5zxw2 0/3 Pending 0 12h
When investigation i Google Cloud I see "Unschedulable" state with "insufficient cpu" error on pod:
When going to Nodes section under my cluster in the Clusters page, I see 681 mCPU requested and 940 mCPU allocated:
What is wrong? Why my pod doesn't start?

Every container has a default CPU request (in GKE I’ve noticed it’s 0.1 CPU or 100m). Assuming these defaults you have three containers in that pod so you’re requesting another 0.3 CPU.
The node has 0.68 CPU (680m) requested by other workloads and a total limit (allocatable) on that node of 0.94 CPU (940m).
If you want to see what workloads are reserving that 0.68 CPU, you need to inspect the pods on the node. In the page on GKE where you see the resource allocations and limits per node, if you click the node it will take you to a page that provides this information.
In my case I can see 2 pods of kube-dns taking 0.26 CPU each, amongst others. These are system pods that are needed to operate the cluster correctly. What you see will also depend on what add-on services you have selected, for example: HTTP Load Balancing (Ingress), Kubernetes Dashboard and so on.
Your pod would take CPU to 0.98 CPU for the node which is more than the 0.94 limit, which is why your pod cannot start.
Note that the scheduling is based on the amount of CPU requested for each workload, not how much it actually uses, or the limit.
Your options:
Turn off any add-on service which is taking CPU resource that you don't need.
Add more CPU resource to your cluster. To do that you will either need to change your node pool to use VMs with more CPU, or increase the number of nodes in your existing pool. You can do this in GKE console or via the gcloud command line.
Make explicit requests in your containers for less CPU that will override the defaults.
apiVersion: apps/v1
kind: Deployment
...
spec:
containers:
- name: my-app-server
image: gcr.io/my-app/server
...
resources:
requests:
cpu: "50m"
- name: my-app-scraper
image: gcr.io/my-app/scraper
...
resources:
requests:
cpu: "50m"
- name: my-app-frontend
image: gcr.io/my-app/frontend
...
resources:
requests:
cpu: "50m"

MongoDb RAM Usage in Kubernetes Pods - Not Aware of Node limits

In Google Container Engines Kubernetes I have 3 Nodes each having 3.75 GB of ram
Now i also have an api that is called from a single endpoint. this endpoint makes batch inserts in mongodb like this.
IMongoCollection<T> stageCollection = Database.GetCollection<T>(StageName);
foreach (var batch in entites.Batch(1000))
{
await stageCollection.InsertManyAsync(batch);
}
Now it happens very often then we endup in scenarios out ouf memory scenarios.
On the one hand we limited the wiredTigerCacheSizeGB to 1.5 and on the other hand we defined a ressource limit for the pod.
But still the same result.
For me it looks like mongodb isn't aware of the memory limit the node pod has.
Is this a known issue? how to deal with it, without scaling to "monster" engines?
the configuration yaml looks like this:
---
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: fast
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-ssd
---
apiVersion: v1
kind: Service
metadata:
name: mongo
labels:
name: mongo
spec:
ports:
- port: 27017
targetPort: 27017
clusterIP: None
selector:
role: mongo
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: mongo
spec:
serviceName: "mongo"
replicas: 1
template:
metadata:
labels:
role: mongo
environment: test
spec:
terminationGracePeriodSeconds: 10
containers:
- name: mongo
image: mongo:3.6
command:
- mongod
- "--replSet"
- rs0
- "--bind_ip"
- "0.0.0.0"
- "--noprealloc"
- "--wiredTigerCacheSizeGB"
- "1.5"
resources:
limits:
memory: "2Gi"
ports:
- containerPort: 27017
volumeMounts:
- name: mongo-persistent-storage
mountPath: /data/db
- name: mongo-sidecar
image: cvallance/mongo-k8s-sidecar
env:
- name: MONGO_SIDECAR_POD_LABELS
value: "role=mongo,environment=test"
volumeClaimTemplates:
- metadata:
name: mongo-persistent-storage
annotations:
volume.beta.kubernetes.io/storage-class: "fast"
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 32Gi
UPDATE
in the meanwhile i also configured the pod antiaffinity to make sure that on the nodes where mongo db is running we don't have any interference in ram. but still we got the oom scenarios –

I'm facing a similar issue where the pod gets OOMKilled even if there is limits and WiredTiger cache limit set.
This PR is tackling the issue for which MongoDB it's taking the node's memory rather than the container memory limit.
In your case I advise you is to update the MongoDB container image to a more recent version (since the PR is fixing 3.6.13, and you are running 3.6).
It may be still the case that your pod will be OOMKilled given that I'm using 4.4.10 and still facing this issue.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse