Google Kubernetes Engine (GKE) CPU/pod - kubernetes

On GKE I have created a cluster with 1 node and n1-standard-1 instance type (vCPU:1, RAM: 3.75 GB). The main purpose of the cluster is to host an application that has 3 pods (mysql, backend and frontend) on default namespace. I can deploy mysql with no problem. After that when I try to deploy the backend it just remains in "Pending" state saying that not enough CPU is available. The message is very verbose.
So my question is, is it not possible to have 3 pods running using 1 cpu unit? I want is reduce cost and let those pods use the same cpu. Is it possible to achieve that? If yes, then how?

The error message "pending" is not that informative. Could you please run
kubectl get pods
and get your pod name and again run
kubectl describe pod {podname}
then you can get a idea about the error message.
By the way you can run 3 pods in a single cpu.

Yes, it is possible to have multiple pods, or 3 in your case, on a single CPU unit.
If you want to manage your memory resources, consider putting constraints such as those described in the official docs. Below is an example.
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: db
image: mysql
env:
- name: MYSQL_ROOT_PASSWORD
value: "password"
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
One would need more information regarding your deployment to answer your queries in a more detailer manner. Please consider providing the same.

Related

Kubernetes. What happens if the request size is greater than the pod's RAM?

I don't understand how replication works in Kubernetes.
I understand that two replicas on different nodes will provide fault tolerance for the application, but I don’t understand this:
Suppose the application is given the following resources:
---
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
memory: "1G"
cpu: "1"
limits:
memory: "1G"
cpu: "1"
The application has two replicas. Thus, in total, 2 CPUs and 2G RAM are available for applications.
But what happens if the application receives a request with a size of 1.75G? After all, only 1G RAM is available in one replica. Will the request be distributed among all replicas?
Answer for Harsh Manvar
Maybe you misunderstood me?
What you explained is not entirely true.
Here is a real, working deployment of four replicas:
$ kubectl get deployment dev-phd-graphql-server-01-master-deployment
NAME READY UP-TO-DATE AVAILABLE AGE
dev-phd-graphql-server-01-master-deployment 4/4 4 4 6d15h
$ kubectl describe deployment dev-phd-graphql-server-01-master-deployment
...
Limits:
cpu: 2
memory: 4G
Requests:
cpu: 2
memory: 4G
...
No, it won't get distributed one replica will start simply and the other will stay in pending state.
If you will describe that pending POD(replica) it show this error :
0/1 nodes available: insufficient cpu, insufficient memory
kubectl describe pod POD-name
K8s will check for the requested resource
requests:
memory: "1G"
cpu: "1"
if mentioned minimum requested resources available it will deploy the replica and other will goes in pending state.
Update
But what happens if the application receives a request with a size of
1.75G? After all, only 1G RAM is available in one replica.
requests:
memory: "1G"
cpu: "1"
limits:
memory: "1G"
cpu: "1"
If you have a set request of 1 GB and application start using the 1.75 GB it will kill or restart the POD due to hitting the limit.
But yes in some cases container might can exceeds the limit if Node has memory available.
A Container can exceed its memory request if the Node has memory
available. But a Container is not allowed to use more than its memory
limit. If a Container allocates more memory than its limit, the
Container becomes a candidate for termination. If the Container
continues to consume memory beyond its limit, the Container is
terminated. If a terminated Container can be restarted, the kubelet
restarts it, as with any other type of runtime failure.
Read more at : https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/#exceed-a-container-s-memory-limit
You might would like to read this also : https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-limits-are-run
See for these kinds of circumstances you need to have an idea of how large of a request it can get and accordingly setup your resource request and limits.
If you feel there can be a request as big as 1.75GB you have tackle it in your source code.
For example you might have a conversion job which takes a lot of resources. You can make it a celery task and host the celery worker in another node group which is made for large tasks (A AWS t3.xlarge for example)
Anyways such large tasks will not generate a result immediately so I don't see a problem in running them asynchronously and giving back the result later maybe even in a websocket message. This will keep you main server from getting clumped up and also will help you to efficiently scale your large tasks

GKE Autopilot - Containers stuck in init phase on particular node

I'm using GKE's Autopilot Cluster to run some kubernetes workloads. Pods getting scheduled to one of the allocated nodes is taking around 10 mins stuck in init phase. Same pod in different node is up in seconds.
deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: jobs
spec:
replicas: 1
selector:
matchLabels:
app: job
template:
metadata:
labels:
app: job
spec:
volumes:
- name: shared-data
emptyDir: {}
initContainers:
- name: init-volume
image: gcr.io/dummy_image:latest
imagePullPolicy: Always
resources:
limits:
memory: "1024Mi"
cpu: "1000m"
ephemeral-storage: "10Gi"
volumeMounts:
- name: shared-data
mountPath: /data
command: ["/bin/sh","-c"]
args:
- cp -a /path /data;
containers:
- name: job-server
resources:
requests:
ephemeral-storage: "5Gi"
limits:
memory: "1024Mi"
cpu: "1000m"
ephemeral-storage: "10Gi"
image: gcr.io/jobprocessor:latest
imagePullPolicy: Always
volumeMounts:
- name: shared-data
mountPath: /ebdata1
This happens only if container has init container. In my case, I'm copying some data from dummy container to shared volume which I'm mounting on actual container..
But whenever pods get scheduled to this particular node, it gets stuck in init phase for around 10 minutes and automatically gets resolved. I couldn't see any errors in event logs.
kubectl describe node problematic-node
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning SystemOOM 52m kubelet System OOM encountered, victim process: cp, pid: 477887
Warning OOMKilling 52m kernel-monitor Memory cgroup out of memory: Killed process 477887 (cp) total-vm:2140kB, anon-rss:564kB, file-rss:768kB, shmem-rss:0kB, UID:0 pgtables:44kB oom_score_adj:-997
Only message is the above warning. Is this issue caused by some misconfiguration from my side?
The best recommendation is for you to manage container compute resources properly within your Kubernetes cluster. When creating a Pod, you can optionally specify how much CPU and memory (RAM) each Container needs to avoid OOM situations.
When Containers have resource requests specified, the scheduler can make better decisions about which nodes to place Pods on. And when Containers have their limits specified, contention for resources on a node can be handled in a specified manner. CPU specifications are in units of cores, and memory is specified in units of bytes.
An event is produced each time the scheduler fails, use the command below to see the status of events:
$ kubectl describe pod <pod-name>| grep Events
Also, read the official Kubernetes guide on “Configure Out Of Resource Handling”. Always make sure to:
reserve 10-20% of memory capacity for system daemons like kubelet and OS kernel identify pods which can be evicted at 90-95% memory utilization to reduce thrashing and incidence of system OOM.
To facilitate this kind of scenario, the kubelet would be launched with options like below:
--eviction-hard=memory.available<xMi
--system-reserved=memory=yGi
Having Heapster container monitoring in place must be helpful for visualization.
Read more reading on Kubernetes and Docker Administration.

How can I use autoscale from K8s in a Redis Cluster if I'm in a Spring boot application with Spring Data (Jedis) connect in a Redis Cluster?

I need to list all nodes from my Redis Cluster on attribute spring.redis.sentinel.nodes? Is it right?
I wanna run a Redis Cluster on K8s to use the autoscaling provided from K8s, How can I use autoscale is it necessary to inform all nodes on spring.redis.sentinel.nodes?
Good question 💯.
The short answer is that you typically don't do autoscaling with stateful apps like Redis since you have to be careful about not corrupting your data. Most of the time you migrate and shard your data, i.e multiple clusters, with different segments of your data, etc.
Having said that, there is no silver bullet redis autoscaling solution but it's doable with a lot of monitoring and testing 🦄. A challenge here is that sentinels change the master in case of failover so your solution needs to be able to determine or monitor who the master is at a certain interval, this is very critical during downscales. Redis has written a pretty good guide on how to create clients which you will probably have to do/understand if you want a reliable autoscaling solution.
So the idea here 💡 is that you start with a set of sentinel/redis nodes managed by a Kubernetes Operator. With some config like this:
apiVersion: databases.spotahome.com/v1
kind: RedisFailover
metadata:
name: redisfailover
spec:
sentinel:
replicas: 3
resources:
requests:
cpu: 100m
limits:
memory: 100Mi
redis:
replicas: 3
resources:
requests:
cpu: 100m
memory: 100Mi
limits:
cpu: 400m
memory: 500Mi
Then maybe modify the controller of this operator to do autoscale based on certain metrics (CPUs, Memory, Storage, etc).
The moment there is an autoscale operation you will have to do a configuration change in your Spring boot application to account for this change (say the ConfigMap of your application). For example, automatically change the value of this:
spring:
cache:
type: redis
redis:
port: 6666
password: 123pwd
sentinel:
master: masterredis
nodes:
- 10.0.0.16
- 10.0.0.17
- 10.0.0.18
lettuce:
shutdown-timeout: 200ms
Now after the config change, you need to do a rolling restart to prevent any downtime. The best thing, in my opinion, to do this in Kubernetes is just to have another Operator (or extend the Redis operator) for your application, that has a controller, to automatically detect when there is scaling operation, does the ConfigMap change, and finally does the rolling restart of your app. Your scaling operations need to allow enough time for balancing of data and also the rolling restart to prevent any thrashing/starvation and possible downtime/data corruption.
✌️☮️

Kubernetes: How to apply Horizontal Pod (HPA) autoscaling for a RC which contains multiple containers?

I have tried using HPA for a RC which contains only one container and it works perfectly fine. But when I have a RC with multiple containers (i.e., a pod containing multiple containers), the HPA is unable to scrape the CPU utilization and shows the status as "Unknown", shown below. How can I successfully implement a HPA for a RC with multiple containers. The Kuberentes docs have no information regarding this and also I didnt find any mention of it not being possible. Can anyone please share their experience or a point of view, with regard to this issue. Thanks a lot.
prometheus-watch-ssltargets-hpa ReplicationController/prometheus <unknown> / 70% 1 10 0 4s
Also for your reference, below is my HPA yaml file.
apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
name: prometheus-watch-ssltargets-hpa
namespace: monitoring
spec:
scaleTargetRef:
apiVersion: v1
kind: ReplicationController
name: prometheus
minReplicas: 1
maxReplicas: 5
targetCPUUtilizationPercentage: 70
By all means it is possible to set a HPA for an RC/Deployment/Replica-set with multiple containers. In my case the problem was the format of resource limit request. I figured out from this link, that if the pod's containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the HPA will not take any action for that metric. In my case I was using the resource request as below, which caused the error(But please note that the following resource request format works absolutely fine when I use it with deployments, replication controllers etc. It is only when, in addition I wanted to implement HPA that caused the problem mentioned in the question.)
resources:
limits:
cpu: 2
memory: 200M
requests:
cpu: 1
memory: 100Mi
But after changing it like below(i.e., with a relevant resource request set that HPA can understand), it works fine.
resources:
limits:
cpu: 2
memory: 200Mi
requests:
cpu: 1
memory: 100Mi

Kubernetes - node capacity

I'm running a small node in gcloud with 2 pods running. Google cloud console shows all resources utilization
<40% cpu utilization
about 8k n\w bytes
about 64 disk bytes.
When adding the next pod, it fails with below error.
FailedScheduling:Failed for reason PodExceedsFreeCPU and possibly others
Based on the numbers I see in google console, ~60% CPU is available. is there anyway to get more logs? Am I missing something obvious here?
Thanks in advance !
As kubernetes reserve some space if more cpu or memory is needed you should check the capacity allocated by the cluster instead of the utilization.
kubectl describe nodes
You can find a deeper description about the capacity of the nodes in: http://kubernetes.io/docs/user-guide/compute-resources/
In your helm chart or Kubernetes yaml, check the resources section. Even if you have free capacity, if your request would put the cluster over, even if your pod etc wouldn't actually use that much, it will fail to schedule. The request is asking for a reservation of capacity. IE:
spec:
serviceAccountName: xxx
containers:
- name: xxx
image: xxx
command:
- cat
tty: true
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "256Mi"
cpu: "250m"
If the value for cpu there could make the cluster oversubscribed, it won't schedule the pod. So make sure your request reflect actual typical usage. If your requests do reflect actual typical usage, then you need more capacity.