Kubernetes - How to calculate resources we need for each container? - kubernetes

How to figure out how much min and max resources to allocate for each application deployment? I'm setting up a cluster and I haven't setup any resources and letting it run freely.
I guess I could use top command to figure out the load during the peak time and work on that but still top says like 6% or 10% but then I'm not sure how to calculate them to produce something like 0.5 cpu or 100 MB. Is there a method/formula to determine max and min based on top command usage?
I'm running two t3.medium nodes and I have the following pods httpd and tomcat in namespace1, mysql in namepsace2, jenkins and gitlab in namespace3. Is there any guide to minimum resources it needs? or Do I have to figure it based on top or some other method?

There are few things to discuss here:
Unix top and kubectl top are different:
Unix top uses proc virtual filesystem and reads /proc/meminfo file to get an actual information about the current memory usage.
kubectl top shows metrics information based on reports from cAdvisor, which collects the resource usage. For example: kubectl top pod POD_NAME --containers: Show metrics for a given pod and its containers or kubectl top node NODE_NAME: Show metrics for a given node.
You can use the metrics-server to get the CPU and memory usage of the pods. With it you will be able to Assign CPU Resources to Containers and Pods.
Optimally, your pods should be using exactly the amount of resources you requested but that's almost impossible to achieve. If the usage is lower than your request, you are wasting resources. If it's higher, you are risking performance issues. Consider a 25% margin up and down the request value as a good starting point. Regarding limits, achieving a good setting would depend on trying and adjusting. There is no optimal value that would fit everyone as it
depends on many factors related to the application itself, the
demand model, the tolerance to errors etc.
As a supplement I recommend going through the Managing Resources for Containers docs.

Related

Kubernetes - find pods without memory limits (or whose memory limits are too high above requests)

On my kubernetes cluster, I have 15% (~4GB) extra GB of memory taken by my pods compared to my memory requests. I suspect it's been the reason some of my nodes have been crashing lately. How can I easily find the misconfigured pods and add the missing limits (ie find pods without memory requests, or whose memory limits are too high compared to requests ?
The easiest option is to use:
kubectl describe node your_node
This command gives you a lot of useful informations about your node and the list of pods running on it. This list includes CPU Requests, CPU Limits, Mem Requests, Mem Limits, etc.
This is fine if you have just a few nodes. But if you have a lot, is not optimal.
Another good option is to use k9s. With k9s cli, you have a nice overview of the running pods in your cluster and if you use the "wide" view (ctrl-w), you can also see all your pods limits and requests.
You can use custom-columns as output format for a get request.
The query syntax is jsonpath, https://kubernetes.io/docs/reference/kubectl/jsonpath/.
For example
#!/bin/bash
ns='NAMESPACE:.metadata.namespace'
pod="POD:.metadata.name"
container='CONTAINER:.spec.containers[*].name'
resource_req_mem='MEM_REQ:.spec.containers[*].resources.requests.memory'
resource_lim_mem='MEM_LIM:.spec.containers[*].resources.limits.memory'
resource_req_cpu='CPU_REQ:.spec.containers[*].resources.requests.cpu'
resource_lim_cpu='CPU_LIM:.spec.containers[*].resources.limits.cpu'
kubectl get pod -A -o custom-columns="$ns,$pod,$container,$resource_req_mem,$resource_lim_mem,$resource_req_cpu,$resource_lim_cpu"

Why the CPU usage of a GKE Workload is not equal to the sum of the CPU usage of its pods?

I'm trying to figure out why a GKE "Workload" CPU usage is not equivalent to the sum of cpu usage of its pods.
Following image shows a Workload CPU usage.
Service Workload CPU Usage
Following images show pods CPU usage for the above Workload.
Pod #1 CPU Usage
Pod #2 CPU Usage
For example, at 9:45, the Workload cpu usage was around 3.7 cores, but at the same time Pod#1 CPU usage was around 0.9 cores and Pod#2 CPU usage was around 0.9 cores too. It means, the service Workload CPU Usage should have been around 1.8 cores, but it wasn't.
Does anyone have an idea of this behavior?
Thanks.
On your VM, the node managed by Kubernetes, you have the deployed pods (that you manage) but also several services that run on it for the supervision, the management, the logs ingestion,... A basic description here
You can see all these basic services by performing this command kubctl get all --namespace kube-system.
If you have installed additional components, like Istio or Knative, you have additional services and namespaces. All of these get a part of the resources of the node.
Danny,
The CPU chart on the Workloads page is an aggregate of CPU usage for managed pods. The values are taken from the Stackdriver Monitoring metric container/cpu/usage_time, check this link. That metric represents "Cumulative CPU usage on all cores in seconds. This number divided by the elapsed time represents usage as a number of cores, regardless of any core limit that might be set."
Please let me know if you have further questions in regard to this.
I suspect this is a bug in the UI. There is no actual metric for deployment CPU usage. Stackdriver Monitoring only collects data on container, pod, and node level metrics thus the only really reliable metrics in this case are the ones for pod CPU usage.
The graph for the total deployment CPU usage is likely meant to be a sum of all the pods metrics calculated and then presented to you. It is not as reliable as the pod or container metrics since it is not a direct metric.
If you are seeing this discrepancy consistently, I recommend opening a UI bug report through the Google Public Issue Tracker to report this to the GCP Engineers.

what are recommendation for pod size(CPU, memory) in kubernetes

I want to know the recommendation set for pod size. I.e. when to put application within pod or at what size it will be better to use machine itself in place of pod.
Ex. when to think of coming out of k8s and used as external service for some application, when pod required 8GB or 16GB or 32GB? Same for CPU intensive.
Because if pod required 16GB or 16 CPU and we have a machine/node of the same size then I think there is no sense of running pod on that machine. If we run in that scenario then it will be like we will having 10 pods and which required 8 Nodes.
Hopes you understand my concern.
So if some one have some recommendation for that then please share your thoughts on that. Some references will be more better.
Recommendation for ideal range:
size of pods in terms of RAM and CPU
Pods is to nodes ratio, i.e. number of pods per nodes
Whether good for stateless or stateful or both type of application or not
etc.
Running 16cpu/16gb pod on 16cpu/16gb machine is normal. Why not? You think of pods to be tiny but there is no such requirement. Pods can be gigantic, there is no issue with that. Remember container is just a process on a node, why you refuse to run a fat process on a fat node? Kubernetes adds very nice orchestration level to containers, why not make use of it?
There is no such thing as a universal or recommended pod size. Asking for recommended pod size is the same as asking for a recommended size for VM or bare metal server. It is totally up to your application. If your application requires 16 or 64 GB of RAM - this is the recommended size for you, you see?
Regarding pods to nodes ratio - current upper limit of Kubernetes is 110 pods per node. Everything below that watermark is fine. The only thing is that recommended master node size increases with total number of pods. If you have like 1000 pods - you go with small to medium size master nodes. If you have over 10 000 pods - you should increase your master nodes size.
Regarding statefulness - stateless application generally survive better. But often state also should be stored somewhere and stored reliably. So if you plan your application as a set of microservices - create as much stateless apps you can and as few stateful as you can. Ideally, only the relational databases should be truly stateful.

Kubernetes - NodeUnderMemoryPressure Issue

I'm very new to Kubernetes. We are using Kubernetes cluster on Google Cloud Platform.
I have created Cluster, Services, Pod, Replica controllers.
I have created Horizontal Pod Autoscaler and it is based on CPU Params.
Cluster details
Default running node count is set to 3
3GB allocatable memory per node
Default running node count is 3 in the cluster.
After running for 1 hour Service and Nodes showing NodeUnderMemoryPressure Issues.
How to resolve this ??
If you any more details, please ask
Thanks
I don't know how much traffic is hitting your cluster, but I would highly recommend running Prometheus in your cluster.
Prometheus is an open-source monitoring and alerting tool, and integrates very well with Kubernetes.
This tool should give you a much better view of memory consumption, CPU usage, amongst many other monitoring capabilities, that will allow you to effectively troubleshoot these types of issues.
There are several ways to address this issue that depends on the type of your workloads.
The easiest is simply scale your nodes, but it can be useless if there is a memory leakage. Even if now you are not affected by it you should always consider the possibility of a memory leakage happening, therefore the best practise is to introduce always memory limits for PODs and Namespaces.
Scale the cluster
if you have many pods running and there are not some of them way bigger that the others it would be useful to scale horizontally your cluster, in this way the number of running pods per nodes will reduce and the NodeUnderMemoryPressure warning should disappear.
if you are running few PODs or some of them are capable to make the cluster suffering alone, then the only option is to scale the nodes vertically adding a new node pool with Compute Engine instances having more memory and possibly delete the old one.
if your workload is correct and you memory suffer because in certain moment of the day you receive 100 times more the usual traffic and you create more pods to support this traffic, you should consider to make use of the Autoscaler.
Check Memory leakages
On the other hand if it is not an "healthy" situation and you have pods consuming way more RAM than expected then you should follow the advice of grizzthedj and understand why your PODs are consuming so much and maybe verify if some of your container is affected by memory leakage and in this case scale the amount of RAM is useless since at some point you will run out of it anyway.
Therefore start to understand which are the PODs consuming too much and then troubleshoot why they have this behaviour, if you do not want to make use of Prometeus simply SSH into the container and check with the classical Linux commands.
Limit the RAM consumed by PODs
To prevent this to happen in the future I advise you when writing YAML file to always limit the amount of RAM they can make use of, in this way you will control them and you will be sure that there is not the risk that they cause the Kubernetes "node agent" to fail because out of memory.
Consider also to limit the CPU and introduce minimum requirements of both RAM and CPU for PODs to help the scheduler to properly schedule the PODs to avoid to hit NodeUnderMemoryPressure under high workload.

Why does a single node cluster only have a small percentage of the cpu quota available?

pod will not start due to "No nodes are available that match all of the following predicates:: Insufficient cpu"
In the above question, I had an issue starting a deployment with 3 containers.
Upon further investigation, it appears there is only 27% of the CPU quota available - which seems very low. The rest of the CPU seems to be assigned to some default bundled containers.
How is this normally mitigated? Is a larger node required? Do limits need to be set manually? Are all those additional containers necessary?
1 cpu for a single node cluster is probably too small.
From the containers in the original answer, both the dashboard and fluentd can be removed:
the dashboard is just a web UI, which can go away if you use kubectl (which you should, IMO);
fluentd should be reading the log files on disk to ship them somewhere (GCP's log aggregation, I think).
The unnecessary containers should be tied to a Deployment or ReplicaSet, which can be listed with kubectl get deployment and kubectl get rs, respectively. You can then kubectl delete them.
Increasing the resources on the node should not change the requirements for the basic pods, meaning they should all be free scheduling.