Kubernetes Pod memory limits - kubernetes

I have question about Rancher Desktop Kubernetes dimension for CPU / Memory...
I normally use 'k3d' but I like to give a go for Rancher Desktop's Kubernetes.
I have a problem with the dimensioning of Kubernetes Container in Rancher Desktop.
'k3d' has a following configuration.--servers-memory which set correctly the allowed memory as I can see with 'kubectl describe nodes'
Capacity:
cpu: 4
ephemeral-storage: 102625208Ki
....
memory: 17179869Ki
pods: 110
You can see the k3d can allocated the memory for Kubernetes Container...
I can find a way / parameter to configure this, there is VM Resources configuration but it is not effecting the dimension of Kubernetes Container (CPU is getting what I set from the Resource but not the Memory)
So RDK, 'kubectl describe nodes'
Capacity:
cpu: 4
ephemeral-storage: 102625208Ki
...
memory: 3050720Ki
pods: 110
Does anybody knows how to give more memory to RDK.
PS. VM Resources configurations were for k3d and RDK was the same, 4 cpu 16GB Memory....

Related

Kubernetes CPU/Memory Resource Allocation on Pod not correct

I have a deployment template which has below resources settings:
Limits:
cpu: 2
memory: 8Gi
Requests:
cpu: 500m
memory: 2Gi
I believe the Kubernetes will allocate my pod with CPU between 500m ~ 2 core, and memory between 2Gi ~ 8Gi.
However, when I ssh into the pod, when I run below command to get the CPU/Memory, seems the resource allocation is not correct:
[root#xxx /]# cat /proc/meminfo
MemTotal: 15950120 kB
MemFree: 6629072 kB
MemAvailable: 12728888 kB
15950120 kB is around 15.9 Gi.
grep 'cpu cores' /proc/cpuinfo | uniq
cpu cores : 2
The CPU on the pod is 2 core.
So my question is why I set the limit for the memory is 8Gi, but I got 16Gi in total?
Also for the CPU, I just request 500m core but why it shows 2cores?
You know, there is an explanation from the official Managing Resources for Containers kubernetes documentation, why there are 2 cores instead of 500m, but it is hard to understand how happend 16Gi instead of 8Gi..
SO basically, documentation sais,
If the node where a Pod is running has enough of a resource available,
it's POSSIBLE (and ALLOWED) for a container TO USE MORE RESOURCES than
its request for that resource specifies. However, a container is not
allowed to use more than its resource limit.
For example, if you set a memory request of 256 MiB for a container,
and that container is in a Pod scheduled to a Node with 8GiB of memory
and no other Pods, then the container can try to use more RAM.

Why are K8 pod limits not being applied?

I am basically trying to apply pod CPU limits on my cluster. I made the edit to my deployment.yaml (added requests, limits) in there and applied the yaml. Below are the observations:
I don't get any error when I apply the file (my existing app pod is terminated, a new one is spun up)
I can see a QoS class being applied when I do a describe on my pod (Qos: Burstable)
The issue is that the limits are not being honored, as I can see in my metrics server that the pod CPU > 300 (whereas the limit set is 200m)
I have a istio sidecar container attached to the pod (but I only want to apply the limit to my app an not Istio)
Snippet of yaml file:
resources:
limits:
cpu: 200m
memory: 100Mi
requests:
cpu: 50m
memory: 50Mi
Any ideas what else I need to check here, I checked all documentation and get no errors but the limits are not being applied. Thanks in advance!
Pod CPU includes all containers in the pod, while the limits you've specified apply only to the app container.
If you query metrics for the container alone, you will probably find that it's honouring the limits you've enforced upon it.
Here's an example prometheus query you can use if you're running it on your cluster, that will allow you to understand the ratio between every container actual CPU usage and it's CPU requests -
max(sum(irate(container_cpu_usage_seconds_total{container=~"<container_name>", image !="",namespace="<namespace>", pod=~"<Deployment_name>"}[5m])) by (pod, namespace))
/
max(sum(kube_pod_container_resource_requests_cpu_cores{namespace="<namespace>", container=~"<container_name>", pod=~"<deployment_name>"}) by (pod,namespace))

Kubernetes Pod 00MKilled Issue

The scenario is we run some web sites based on an nginx image.
When we had our cluster setup with nodes of 2cores and 4GB RAM each.
The pods had the following configurations, cpu: 40m and memory: 100MiB.
Later, we upgraded our cluster with nodes of 4cores and 8GB RAM each.
But kept on getting 00MKilled in every pod.
So we increased memory on every pods to around 300MiB and then every thing seems to be working fine.
My question is why does this happen and how do I solve it.
P.S. if we revert back to each node being 2cores and 4GB RAM, the pods work just fine with decreased resources of 100MiB.
Any help would be highly appreciated.
Regards.
For each container in kubernetes you can configure resources for both cpu and memory, like following
resources:
limits:
cpu: 100m
memory: "200Mi"
requests:
cpu: 50m
memory: "100Mi"
According to documentation
When you specify the resource request for Containers in a Pod, the scheduler uses this information to decide which node to place the Pod on. When you specify a resource limit for a Container, the kubelet enforces those limits so that the running container is not allowed to use more of that resource than the limit you set.
So if you set memory: 100MiB on resources:limits and your container consume more than 100MiB memory then you will get out of memory (OOM) error
For more details about request and limits on resources click here

How to fix ephemeral local storage problem?

I'm Running some deployment on EKS k8s 1.16 and after ~5 min my pod gets Evicted with the following message:
Pod ephemeral local storage usage exceeds the total limit of containers 1Gi.
My node has 20Gi ephemeral storage.
My QoS Class is Guaranteed and no matter which amount of ephemeral-storage I configure in my yaml, I see the same error with the amount I configure.
Do you have a clue what can be done?
My yaml file is here: https://slexy.org/view/s2096sex7L
It's because you're putting an upper limit of ephemeral-storage usage by setting resources.limits.ephemeral-storage to 1Gi. Remove the limits.ephemeral-storage if safe, or change the value depending upon your requirement.
resources:
limits:
memory: "61Gi"
cpu: "7500m"
ephemeral-storage: "1Gi" <----- here
requests:
memory: "61Gi"
cpu: "7500m"
ephemeral-storage: "1Gi"
Requests and limits
If the node where a Pod is running has enough of a resource available, it’s possible (and allowed) for a container to use more resources than its request for that resource specifies. However, a container is not allowed to use more than its resource limit.
If you're reading this and you're using GKE Autopilot, there is a hard limit of 10G for ephemeral storage in Autopilot. I would recommend moving your storage to a Volume.
See Autopilot documentation here

Kubernetes pods failing on "Pod sandbox changed, it will be killed and re-created"

On a Google Container Engine cluster (GKE), I see sometimes a pod (or more) not starting and looking in its events, I can see the following
Pod sandbox changed, it will be killed and re-created.
If I wait - it just keeps re-trying.
If I delete the pod, and allow them to be recreated by the Deployment's Replica Set, it will start properly.
The behavior is inconsistent.
Kubernetes versions 1.7.6 and 1.7.8
Any ideas?
In my case it happened because of too little memory and CPU limits
For example, in your manifest file, increase the limits and requests from:
limits:
cpu: 100m
memory: 128Mi
requests:
cpu: 100m
memory: 128Mi
to this:
limits:
cpu: 1000m
memory: 2048Mi
requests:
cpu: 500m
memory: 1024Mi
I can see following message posted in Google Cloud Status Dashboard:
"We are investigating an issue affecting Google Container Engine (GKE) clusters where after docker crashes or is restarted on a node, pods are unable to be scheduled.
The issue is believed to be affecting all GKE clusters running Kubernetes v1.6.11, v1.7.8 and v1.8.1.
Our Engineering Team suggests: If nodes are on release v1.6.11, please downgrade your nodes to v1.6.10. If nodes are on release v1.7.8, please downgrade your nodes to v1.7.6. If nodes are on v1.8.1, please downgrade your nodes to v1.7.6.
Alternative workarounds are also provided by the Engineering team in this doc . These workarounds are applicable to the customers that are unable to downgrade their nodes."
I was affected by same issue on one node in GKE 1.8.1 cluster (other nodes were fine). I did following:
Make sure your node pool has some headroom to receive all pods scheduled on affected node. When in doubt, increase node pool by 1.
Drain affected node following this manual:
kubectl drain <node>
You may run into warnings about daemonsets or pods with local storage, proceed with operation.
Power down affected node in Compute Engine. GKE should schedule replacement node if your pool size is smaller than specified in pool description.