Kubernetes - node capacity

I'm running a small node in gcloud with 2 pods running. Google cloud console shows all resources utilization
<40% cpu utilization
about 8k n\w bytes
about 64 disk bytes.
When adding the next pod, it fails with below error.
FailedScheduling:Failed for reason PodExceedsFreeCPU and possibly others
Based on the numbers I see in google console, ~60% CPU is available. is there anyway to get more logs? Am I missing something obvious here?
Thanks in advance !

As kubernetes reserve some space if more cpu or memory is needed you should check the capacity allocated by the cluster instead of the utilization.
kubectl describe nodes
You can find a deeper description about the capacity of the nodes in: http://kubernetes.io/docs/user-guide/compute-resources/

In your helm chart or Kubernetes yaml, check the resources section. Even if you have free capacity, if your request would put the cluster over, even if your pod etc wouldn't actually use that much, it will fail to schedule. The request is asking for a reservation of capacity. IE:
serviceAccountName: xxx
- name: xxx
image: xxx
- cat
tty: true
memory: "256Mi"
cpu: "250m"
memory: "256Mi"
cpu: "250m"
If the value for cpu there could make the cluster oversubscribed, it won't schedule the pod. So make sure your request reflect actual typical usage. If your requests do reflect actual typical usage, then you need more capacity.


Kubernetes. What happens if the request size is greater than the pod's RAM?

I don't understand how replication works in Kubernetes.
I understand that two replicas on different nodes will provide fault tolerance for the application, but I don’t understand this:
Suppose the application is given the following resources:
apiVersion: v1
kind: Pod
name: frontend
- name: app
image: images.my-company.example/app:v4
memory: "1G"
cpu: "1"
memory: "1G"
cpu: "1"
The application has two replicas. Thus, in total, 2 CPUs and 2G RAM are available for applications.
But what happens if the application receives a request with a size of 1.75G? After all, only 1G RAM is available in one replica. Will the request be distributed among all replicas?
Answer for Harsh Manvar
Maybe you misunderstood me?
What you explained is not entirely true.
Here is a real, working deployment of four replicas:
$ kubectl get deployment dev-phd-graphql-server-01-master-deployment
dev-phd-graphql-server-01-master-deployment 4/4 4 4 6d15h
$ kubectl describe deployment dev-phd-graphql-server-01-master-deployment
cpu: 2
memory: 4G
cpu: 2
memory: 4G
No, it won't get distributed one replica will start simply and the other will stay in pending state.
If you will describe that pending POD(replica) it show this error :
0/1 nodes available: insufficient cpu, insufficient memory
kubectl describe pod POD-name
K8s will check for the requested resource
memory: "1G"
cpu: "1"
if mentioned minimum requested resources available it will deploy the replica and other will goes in pending state.
But what happens if the application receives a request with a size of
1.75G? After all, only 1G RAM is available in one replica.
memory: "1G"
cpu: "1"
memory: "1G"
cpu: "1"
If you have a set request of 1 GB and application start using the 1.75 GB it will kill or restart the POD due to hitting the limit.
But yes in some cases container might can exceeds the limit if Node has memory available.
A Container can exceed its memory request if the Node has memory
available. But a Container is not allowed to use more than its memory
limit. If a Container allocates more memory than its limit, the
Container becomes a candidate for termination. If the Container
continues to consume memory beyond its limit, the Container is
terminated. If a terminated Container can be restarted, the kubelet
restarts it, as with any other type of runtime failure.
Read more at : https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#exceed-a-container-s-memory-limit
You might would like to read this also : https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-limits-are-run
See for these kinds of circumstances you need to have an idea of how large of a request it can get and accordingly setup your resource request and limits.
If you feel there can be a request as big as 1.75GB you have tackle it in your source code.
For example you might have a conversion job which takes a lot of resources. You can make it a celery task and host the celery worker in another node group which is made for large tasks (A AWS t3.xlarge for example)
Anyways such large tasks will not generate a result immediately so I don't see a problem in running them asynchronously and giving back the result later maybe even in a websocket message. This will keep you main server from getting clumped up and also will help you to efficiently scale your large tasks

Why is my deployment not using the requested cpu in Kubernetes MInikube?

I have created a deployment with the following resources:
memory: "128Mi"
cpu: "0.45"
memory: "128Mi"
cpu: "0.8"
Using the minikube metrics server I can see that my pod CPU usage is below the requested of 450m and is only using around 150m. Shouldn't it always use 450m as a minimum value since I requested it in my .yaml file? The CPU usage goes up only if I dramatically increase the workload of the deployment. Can I have my deployment use 450m as baseline and not go below that value?
The requested value is a hint for the scheduler to help good placement of the workload. If your application does not make use of the requested resources, this is fine.
The limit will ensure no more resources are used: For CPU it will be throttled, if more RAM is used, the workload is killed (out of memory).

CPU request on kubernetes

I have a resource block for my pod like -
cpu: 3000m
memory: 512Mi
memory: 512Mi
does it by default take request allocation for CPU (i.e 3000m) which is mentioned in resource limits (3000m). Because in my case it taking 3000m as default cpu in request even though I have not mentioned it.
What you observed is correct, K8s will assign the requests.cpu that matches the limits.cpu when you only define the limits.cpu and not the requests.cpu. Official document here.
sourced from kubernetes documentation
If you specify a CPU limit for a Container but do not specify a CPU request, Kubernetes automatically assigns a CPU request that matches the limit. Similarly, if a Container specifies its own memory limit, but does not specify a memory request, Kubernetes automatically assigns a memory request that matches the limit

Kubernetes Deployment makes great use of cpu and memory without stressing it

I have deployed an app on Kubernetes and would like to test hpa.
With kubectl top nodes command, i noticed that cpu and memory are increased without stressing it.
Does it make sense?
Also while stressing deployment with apache bench, cpu and memory dont be increased enough to pass the target and make a replica.
My Deployment yaml file is so big to provide it. This is one of my containers.
- name: web
image: php_apache:1.0
imagePullPolicy: Always
memory: 50Mi
cpu: 80m
memory: 100Mi
cpu: 120m
- name: shared-data
mountPath: /var/www/html
- containerPort: 80
It consists of 15 containers
I have a VM that contains a cluster with 2 nodes (master,worker).
I would like to stress deployment so that i can see it scale up.
But here I think there is a problem! Without stressing the app, the
CPU/Memory from Pod has passed the target and 2 replicas have been made (without stressing it).
I know that the more Requests i provide to containers the less is that percentage.
But does it make sense the usage of memory/cpu to be increased from the beggining, without stressing it?
I would like, the left part of target(the usage of memory in pods), be at the beggining 0% and as much as I stress it to be increased and create replicas.
But as i'm stressing with apache bench, the value is increased by a maximum of 10%
We can see here the usage of CPU:
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
x-app-55b54b6fc8-7dqjf 76m 765Mi
!!59% is the usage of memory from the pod and is described by Sum of Memory Requests/Memory(usage of memory). In my case 59% = 765Mi/1310Mi
HPA yaml file:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
name: hpa
maxReplicas: 10
minReplicas: 1
apiVersion: apps/v1
kind: Deployment
name: myapp
- type: Resource
name: memory
type: Utilization
averageUtilization: 35
With kubectl top nodes command, i noticed that cpu and memory are increased without stressing it. Does it make sense?
Yes, it makes sense. If you will check Google Cloud about Requests and Limits
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
But does it make sense the usage of memory/cpu to be increased from the beggining, without stressing it?
Yes as, for example your container www it can start with memory: 50Mi and cpu: 80m but its allowed to increase to memory: 100Mi and cpu: 120m. Also as you mentioned you have 15 containers in total, so depends on their request, limits it can reach more than 35% of your memory.
In HPA documentation - algorithm-details you can find information:
When a targetAverageValue or targetAverageUtilization is specified, the currentMetricValue is computed by taking the average of the given metric across all Pods in the HorizontalPodAutoscaler's scale target. Before checking the tolerance and deciding on the final values, we take pod readiness and missing metrics into consideration, however.
All Pods with a deletion timestamp set (i.e. Pods in the process of being shut down) and all failed Pods are discarded.
If a particular Pod is missing metrics, it is set aside for later; Pods with missing metrics will be used to adjust the final scaling amount.
Not sure about last question:
!!59% is the usage of memory from the pod and is described by Sum of Memory Requests/Memory(usage of memory). In my case 59% = 765Mi/1310Mi
In your HPA you set to create another pod when averageUtilization: will reach 35% of memory. It reached 59% and it created another pod. As HPA target is memory, HPA is not counting CPU at all. Also please keep in mind as this is average it needs about ~1 minute to change values.
For better understanding how HPA is working, please try this walkthrough.
If this was not helpful, please clarify what are you exact asking.

Google Kubernetes Engine (GKE) CPU/pod

On GKE I have created a cluster with 1 node and n1-standard-1 instance type (vCPU:1, RAM: 3.75 GB). The main purpose of the cluster is to host an application that has 3 pods (mysql, backend and frontend) on default namespace. I can deploy mysql with no problem. After that when I try to deploy the backend it just remains in "Pending" state saying that not enough CPU is available. The message is very verbose.
So my question is, is it not possible to have 3 pods running using 1 cpu unit? I want is reduce cost and let those pods use the same cpu. Is it possible to achieve that? If yes, then how?
The error message "pending" is not that informative. Could you please run
kubectl get pods
and get your pod name and again run
kubectl describe pod {podname}
then you can get a idea about the error message.
By the way you can run 3 pods in a single cpu.
Yes, it is possible to have multiple pods, or 3 in your case, on a single CPU unit.
If you want to manage your memory resources, consider putting constraints such as those described in the official docs. Below is an example.
apiVersion: v1
kind: Pod
name: frontend
- name: db
image: mysql
value: "password"
memory: "64Mi"
cpu: "250m"
memory: "128Mi"
cpu: "500m"
One would need more information regarding your deployment to answer your queries in a more detailer manner. Please consider providing the same.