I have a Kubernetes cluster running on GKE, and I created a new namespace with a ResourceQuota:
yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: bot-quota
spec:
hard:
requests.cpu: '500m'
requests.memory: 1Gi
limits.cpu: '1000m'
limits.memory: 2Gi
which I apply to my namespace (called bots), which gives me kubectl describe resourcequota --namespace=bots:
Name: bot-quota
Namespace: bots
Resource Used Hard
-------- ---- ----
limits.cpu 0 1
limits.memory 0 2Gi
requests.cpu 0 500m
requests.memory 0 1Gi
Name: gke-resource-quotas
Namespace: bots
Resource Used Hard
-------- ---- ----
count/ingresses.extensions 0 5k
count/jobs.batch 0 10k
pods 0 5k
services 0 1500
This is what I expect - and my expectation is that the bots namespace is hard limited to above limits.
Now I would like to deploy a single pod onto that namespace, using this simple yaml:
apiVersion: v1
kind: Pod
metadata:
name: podname
namespace: bots
labels:
app: someLabel
spec:
nodeSelector:
cloud.google.com/gke-nodepool: default-pool
containers:
- name: containername
image: something-image-whatever:latest
resources:
requests:
memory: '96Mi'
cpu: '300m'
limits:
memory: '128Mi'
cpu: '0.5'
args: ['0']
Given the resources specified`, I'd expect to be well in range, deploying a single instance. When I apply the yaml though:
Error from server (Forbidden): error when creating "pod.yaml": pods "podname" is forbidden: exceeded quota: bot-quota, requested: limits.cpu=2500m, used: limits.cpu=0, limited: limits.cpu=1
If I change the pod's yaml to use a cpu limit of 0.3, then the same error appear with limits.cpu=2300m requested.
In other words: it seems to miraculously add 2000m (=2) cpu to my limit.
We do NOT have any LimitRange applied.
What am I missing?
As discussed in the comments above, it is indeed related to istio. How?
As it is (now) obvious, the requests and limits are specified on container level, and NOT on pod/deployment level. Why is that relevant?
Running istio (in our case, managed istio on GKE), the container is not alone in the "workload", much rather it also has istio-init (which is terminated soon after starting) plus istio-proxy.
And these additional containers apply their own limits & resources, in the current pod I am looking at for example:
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
on istio-proxy (using: kubectl describe pods <podid>)
This indeed explains why the WHOLE deployment has 2 cpu more in the limit as expected.
Related
I am trying to understand how hpa works but I have some concerns:
In case my service is set like this:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
and I configure hpa in this way:
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-service
minReplicas: 3
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Is it preventing my service to reach the limits (500m), right?
Is it better to configure by putting a higher value like 80%?
I have this doubt because with this configuration I see pods scaled to the maximum number even if they are using less cpu than limits:
NAME CPU(cores) MEMORY(bytes)
test-service-76f8b8c894-2f944 189m 283Mi
test-service-76f8b8c894-2ztt6 183m 278Mi
test-service-76f8b8c894-4htzg 117m 233Mi
test-service-76f8b8c894-5hxhv 142m 193Mi
test-service-76f8b8c894-6bzbj 140m 200Mi
test-service-76f8b8c894-6sj5m 149m 261Mi
The amount of CPU used is less than the request configured in the definition of the service.
Moreover, I have seen that it has been discussed here as well but I didn't get the answer.
Using Horizontal Pod Autoscaling along with resource requests and limits
Is it preventing my service to reach the limits (500m), right?
No, hpa is not preventing it (althogh resources.limits is). What hpa does is starting new replicas when the average cpu utilization across all pods gets above 50% of requested cpu resources, i.e. above 125m.
Is it better to configure by putting a higher value like 80%?
Can't say, it is application specific.
Horizontal autoscaling is pretty well described in the documentation.
I'm using GKE's Autopilot Cluster to run some kubernetes workloads. Pods getting scheduled to one of the allocated nodes is taking around 10 mins stuck in init phase. Same pod in different node is up in seconds.
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jobs
spec:
replicas: 1
selector:
matchLabels:
app: job
template:
metadata:
labels:
app: job
spec:
volumes:
- name: shared-data
emptyDir: {}
initContainers:
- name: init-volume
image: gcr.io/dummy_image:latest
imagePullPolicy: Always
resources:
limits:
memory: "1024Mi"
cpu: "1000m"
ephemeral-storage: "10Gi"
volumeMounts:
- name: shared-data
mountPath: /data
command: ["/bin/sh","-c"]
args:
- cp -a /path /data;
containers:
- name: job-server
resources:
requests:
ephemeral-storage: "5Gi"
limits:
memory: "1024Mi"
cpu: "1000m"
ephemeral-storage: "10Gi"
image: gcr.io/jobprocessor:latest
imagePullPolicy: Always
volumeMounts:
- name: shared-data
mountPath: /ebdata1
This happens only if container has init container. In my case, I'm copying some data from dummy container to shared volume which I'm mounting on actual container..
But whenever pods get scheduled to this particular node, it gets stuck in init phase for around 10 minutes and automatically gets resolved. I couldn't see any errors in event logs.
kubectl describe node problematic-node
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning SystemOOM 52m kubelet System OOM encountered, victim process: cp, pid: 477887
Warning OOMKilling 52m kernel-monitor Memory cgroup out of memory: Killed process 477887 (cp) total-vm:2140kB, anon-rss:564kB, file-rss:768kB, shmem-rss:0kB, UID:0 pgtables:44kB oom_score_adj:-997
Only message is the above warning. Is this issue caused by some misconfiguration from my side?
The best recommendation is for you to manage container compute resources properly within your Kubernetes cluster. When creating a Pod, you can optionally specify how much CPU and memory (RAM) each Container needs to avoid OOM situations.
When Containers have resource requests specified, the scheduler can make better decisions about which nodes to place Pods on. And when Containers have their limits specified, contention for resources on a node can be handled in a specified manner. CPU specifications are in units of cores, and memory is specified in units of bytes.
An event is produced each time the scheduler fails, use the command below to see the status of events:
$ kubectl describe pod <pod-name>| grep Events
Also, read the official Kubernetes guide on “Configure Out Of Resource Handling”. Always make sure to:
reserve 10-20% of memory capacity for system daemons like kubelet and OS kernel identify pods which can be evicted at 90-95% memory utilization to reduce thrashing and incidence of system OOM.
To facilitate this kind of scenario, the kubelet would be launched with options like below:
--eviction-hard=memory.available<xMi
--system-reserved=memory=yGi
Having Heapster container monitoring in place must be helpful for visualization.
Read more reading on Kubernetes and Docker Administration.
I created two deployments (deployment happening with a kubenetes operator and there are other activities, like service creation, secret creation etc., also there though i feel they are not related to this error) and expected for the pods to come up but pods dint come up.
when I checked the events I found there is below error for both the pods(i am listing one)
60m Warning FailedCreate replicaset/sast-edge-5577bdcf6c Error creating: pods "sas-edge-5577bdcf6c-94ps9" is forbidden: failed quota: app-xxxx-stable-biw9247u7z: must specify limits.memory
When I describe pod I see that the limits have been specified
image: registry.xxx.xxxx.com/xxkl/xxx-xxxx:1.4
imagePullPolicy: IfNotPresent
name: nsc-core
ports:
- containerPort: 3000
protocol: TCP
resources:
limits:
memory: 500Mi
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
I also checked the quota for the NS
Name: app-xxxx-stable-biw9247u7z
Namespace: app-xxxx-stable-biw9247u7z
Resource Used Hard
-------- ---- ----
limits.memory 4072Mi 8Gi
I am not sure why kubernetes is not the specified resource limit. Need help.
Forbidden Failed quota error comes when any of the containers in the pod does not have limits and requests in the spec and that includes init containers too. Adding limits and requests to all containers should solve the error.
Add something like this under spec section
containers:
- name: nsc-core
image: registry.xxx.xxxx.com/xxkl/xxx-xxxx:1.4
resources:
limits:
cpu: "2"
memory: 2000Mi
requests:
cpu: "0.2"
memory: 1500Mi
For development, I need to reduce cpu usage of minikube.
My goal is to limit cpu usage for each kube-system pods.
I tried to modify cpu limits in kube-dns yaml editor from dashboard, but I get an error (probably because it is a system pod).
Is there a way to modify those kube-system .yml files before starting minikube and get a customized kube-system ?
I'm using minikube on windows.
Thank you in advance for your help.
To achieve your goal, you can try to use limits for namespaces.
You may create YAML:
apiVersion: v1
kind: LimitRange
metadata:
name: cpu-limit-range
spec:
limits:
- default:
cpu: 1
defaultRequest:
cpu: 0.5
type: Container
and add the amount which you need (for example 0.003), then apply it to kube-systemnamespace.
kubectl create -f LimitRangeCPU.yaml --namespace=kube-system
In this case, if pods have no pre-configured resource limits, LimitRangeCPU will be used as default for all pods in namespace kube-system, where all system pods are located.
Update
You can export your existing configs:
kubectl get -o=yaml -n kube-system --export deployment.extensions/kube-dns > kube-dns.yaml
than update resource usage in section:
resources:
limits:
memory: 170Mi
requests:
cpu: 100m
memory: 70Mi
and apply changes:
kubectl apply -f ./kube-dns --namespace=kube-system
It should help to update existing pods.
I have tried using HPA for a RC which contains only one container and it works perfectly fine. But when I have a RC with multiple containers (i.e., a pod containing multiple containers), the HPA is unable to scrape the CPU utilization and shows the status as "Unknown", shown below. How can I successfully implement a HPA for a RC with multiple containers. The Kuberentes docs have no information regarding this and also I didnt find any mention of it not being possible. Can anyone please share their experience or a point of view, with regard to this issue. Thanks a lot.
prometheus-watch-ssltargets-hpa ReplicationController/prometheus <unknown> / 70% 1 10 0 4s
Also for your reference, below is my HPA yaml file.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: prometheus-watch-ssltargets-hpa
namespace: monitoring
spec:
scaleTargetRef:
apiVersion: v1
kind: ReplicationController
name: prometheus
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 70
By all means it is possible to set a HPA for an RC/Deployment/Replica-set with multiple containers. In my case the problem was the format of resource limit request. I figured out from this link, that if the pod's containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the HPA will not take any action for that metric. In my case I was using the resource request as below, which caused the error(But please note that the following resource request format works absolutely fine when I use it with deployments, replication controllers etc. It is only when, in addition I wanted to implement HPA that caused the problem mentioned in the question.)
resources:
limits:
cpu: 2
memory: 200M
requests:
cpu: 1
memory: 100Mi
But after changing it like below(i.e., with a relevant resource request set that HPA can understand), it works fine.
resources:
limits:
cpu: 2
memory: 200Mi
requests:
cpu: 1
memory: 100Mi