GKE autopilot has scaled up my container resources contary to resource requests - kubernetes

I have a container running in a GKE autopilot K8s cluster. I have the following in my deployment manifest (only relevant parts included):
apiVersion: apps/v1
kind: Deployment
spec:
template:
spec:
containers:
resources:
requests:
memory: "250Mi"
cpu: "512m"
So I've requested the minimum resources that GKE autopilot allows for normal pods. Note that I have not specified a limits.
However, having applied the manifest and looking at the yaml I see that it does not match what's in the manifest I applied:
resources:
limits:
cpu: 750m
ephemeral-storage: 1Gi
memory: 768Mi
requests:
cpu: 750m
ephemeral-storage: 1Gi
memory: 768Mi
Any idea what's going on here? Why has GKE scaled up the resources. This is costing me more money unnecessarily?
Interestingly it was working as intended until recently. This behaviour only seemed to start in the past few days.

If the resources that you've requested are following:
memory: "250Mi"
cpu: "512m"
Then they are not compliant with the minimal amount of resources that GKE Autopilot will assign. Please take a look on the documentation:
NAME
Normal Pods
CPU
250 mCPU
Memory
512 MiB
Ephemeral storage
10 MiB (per container)
-- Cloud.google.com: Kubernetes Engine: Docs: Concepts: Autopilot overview: Allowable resource ranges
As you can see the amount of memory you've requested was too small and that's why you saw the following message (and the manifest was modified to increate the requests/limits):
Warning: Autopilot increased resource requests for Deployment default/XYZ to meet requirements. See http://g.co/gke/autopilot-resources.
To fix that you will need to assign resources that are within the limits of the documentation, I've included in the link above.

Related

Horizontal Pod Autoscaling and resource configuration calibration

I am trying to understand how hpa works but I have some concerns:
In case my service is set like this:
resources:
limits:
cpu: 500m
memory: 1Gi
requests:
cpu: 250m
memory: 512Mi
and I configure hpa in this way:
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: test-service
minReplicas: 3
maxReplicas: 6
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
Is it preventing my service to reach the limits (500m), right?
Is it better to configure by putting a higher value like 80%?
I have this doubt because with this configuration I see pods scaled to the maximum number even if they are using less cpu than limits:
NAME CPU(cores) MEMORY(bytes)
test-service-76f8b8c894-2f944 189m 283Mi
test-service-76f8b8c894-2ztt6 183m 278Mi
test-service-76f8b8c894-4htzg 117m 233Mi
test-service-76f8b8c894-5hxhv 142m 193Mi
test-service-76f8b8c894-6bzbj 140m 200Mi
test-service-76f8b8c894-6sj5m 149m 261Mi
The amount of CPU used is less than the request configured in the definition of the service.
Moreover, I have seen that it has been discussed here as well but I didn't get the answer.
Using Horizontal Pod Autoscaling along with resource requests and limits
Is it preventing my service to reach the limits (500m), right?
No, hpa is not preventing it (althogh resources.limits is). What hpa does is starting new replicas when the average cpu utilization across all pods gets above 50% of requested cpu resources, i.e. above 125m.
Is it better to configure by putting a higher value like 80%?
Can't say, it is application specific.
Horizontal autoscaling is pretty well described in the documentation.

Why is my deployment not using the requested cpu in Kubernetes MInikube?

I have created a deployment with the following resources:
resources:
requests:
memory: "128Mi"
cpu: "0.45"
limits:
memory: "128Mi"
cpu: "0.8"
Using the minikube metrics server I can see that my pod CPU usage is below the requested of 450m and is only using around 150m. Shouldn't it always use 450m as a minimum value since I requested it in my .yaml file? The CPU usage goes up only if I dramatically increase the workload of the deployment. Can I have my deployment use 450m as baseline and not go below that value?
The requested value is a hint for the scheduler to help good placement of the workload. If your application does not make use of the requested resources, this is fine.
The limit will ensure no more resources are used: For CPU it will be throttled, if more RAM is used, the workload is killed (out of memory).

Kubernetes Resource Requests and Limits

I'm new to kubernetes. I'm just wondering is there any downside if i'm set the value for kubernetes container resource requests and limits as max as possible like this?
resources:
limits:
cpu: '3'
memory: 1Gi
requests:
cpu: '2'
memory: 256Mi
You should set requests to the minimum values your pod needs and limits to the max you allow it to use. It helps Kubernetes to schedule pods properly.
If the requests value is too high, then Kubernetes may not have any node that fulfills these requirements and your pod may not run at all.
Check this link for more details: https://sysdig.com/blog/kubernetes-limits-requests/

Google Kubernetes Engine (GKE) CPU/pod

On GKE I have created a cluster with 1 node and n1-standard-1 instance type (vCPU:1, RAM: 3.75 GB). The main purpose of the cluster is to host an application that has 3 pods (mysql, backend and frontend) on default namespace. I can deploy mysql with no problem. After that when I try to deploy the backend it just remains in "Pending" state saying that not enough CPU is available. The message is very verbose.
So my question is, is it not possible to have 3 pods running using 1 cpu unit? I want is reduce cost and let those pods use the same cpu. Is it possible to achieve that? If yes, then how?
The error message "pending" is not that informative. Could you please run
kubectl get pods
and get your pod name and again run
kubectl describe pod {podname}
then you can get a idea about the error message.
By the way you can run 3 pods in a single cpu.
Yes, it is possible to have multiple pods, or 3 in your case, on a single CPU unit.
If you want to manage your memory resources, consider putting constraints such as those described in the official docs. Below is an example.
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: db
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
One would need more information regarding your deployment to answer your queries in a more detailer manner. Please consider providing the same.

Kubernetes: How to apply Horizontal Pod (HPA) autoscaling for a RC which contains multiple containers?

I have tried using HPA for a RC which contains only one container and it works perfectly fine. But when I have a RC with multiple containers (i.e., a pod containing multiple containers), the HPA is unable to scrape the CPU utilization and shows the status as "Unknown", shown below. How can I successfully implement a HPA for a RC with multiple containers. The Kuberentes docs have no information regarding this and also I didnt find any mention of it not being possible. Can anyone please share their experience or a point of view, with regard to this issue. Thanks a lot.
prometheus-watch-ssltargets-hpa ReplicationController/prometheus <unknown> / 70% 1 10 0 4s
Also for your reference, below is my HPA yaml file.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: prometheus-watch-ssltargets-hpa
namespace: monitoring
spec:
scaleTargetRef:
apiVersion: v1
kind: ReplicationController
name: prometheus
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 70
By all means it is possible to set a HPA for an RC/Deployment/Replica-set with multiple containers. In my case the problem was the format of resource limit request. I figured out from this link, that if the pod's containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the HPA will not take any action for that metric. In my case I was using the resource request as below, which caused the error(But please note that the following resource request format works absolutely fine when I use it with deployments, replication controllers etc. It is only when, in addition I wanted to implement HPA that caused the problem mentioned in the question.)
resources:
limits:
cpu: 2
memory: 200M
requests:
cpu: 1
memory: 100Mi
But after changing it like below(i.e., with a relevant resource request set that HPA can understand), it works fine.
resources:
limits:
cpu: 2
memory: 200Mi
requests:
cpu: 1
memory: 100Mi