Kubernetes pod not starting up - kubernetes

I would like to create a kubernetes Pod on a specific node pool (in AKS, k8s v1.18.19) that has the (only) taint for=devs:NoSchedule and the (only) label for: devs. The Pod should have at least 4 cpu cores and 12gb of memory available. The node pool has size Standard_B12ms (so 12 vCPU and 48gb RAM) and the single node on it has version AKSUbuntu-1804gen2-2021.07.17. The node has status "ready".
When I start the Pod with kubectl apply -f mypod.yaml the pod is created but the status is stuck in ContainerCreating. When I reduce the resource requirements to 1 vCPU and 2gb memory it starts fine, so it seems that 4 vCPU and 12gb memory is too large but I don't get why.
kind: Pod
apiVersion: v1
metadata:
name: user-ubuntu
labels:
for: devs
spec:
containers:
- name: user-ubuntu
image: ubuntu:latest
command: ["/bin/sleep", "3650d"]
imagePullPolicy: IfNotPresent
resources:
requests:
cpu: "4"
memory: 12G
limits:
cpu: "6"
memory: 20G
volumeMounts:
- mountPath: "/mnt/azure"
name: volume
restartPolicy: Always
volumes:
- name: volume
persistentVolumeClaim:
claimName: pvc-user-default
tolerations:
- key: "for"
operator: "Equal"
value: "devs"
effect: "NoSchedule"
nodeSelector:
for: devs

Related

How to specify the memory request and limit in kubernetes pod

when creating a Pod with one container. The Container has a memory request of 200 MiB and a memory limit of 400 MiB. Look at configuration file for the creating Pod:
apiVersion: v1
kind: Pod
metadata:
name: memory-demo
namespace: mem-example
spec:
containers:
- name: memory-demo-ctr
image: polinux/stress
resources:
requests:
memory: "200Mi"
limits:
memory: "400Mi"
command: ["stress"]
args: ["--vm", "1", "--vm-bytes", "150M", "--vm-hang", "1]
Above yaml not working as intended, The args section in the configuration file not providing arguments for the Container when it starting.
Tried to create kubernetes pod memory limits & requests & seems fail to create.

Kubernetes - free claimed resources of a Pod in a failed state

I have got the following template for a job:
apiVersion: batch/v1
kind: Job
metadata:
name: "gpujob"
spec:
completions: 1
backoffLimit: 0
ttlSecondsAfterFinished: 600000
template:
metadata:
name: batch
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: "test"
containers:
- name: myhub
image: smat-jupyterlab
env:
- name: JUPYTERHUB_COOKIE_SECRET
value: "sdadasdasda"
resources:
requests:
memory: 500Gi
limits:
nvidia.com/gpu: 1
command: ["/bin/bash", "/usr/local/bin/jobscript.sh", smat-job]
volumeMounts:
- name: data
mountPath: /data
restartPolicy: Never
nodeSelector:
dso-node-role: "inference"
As you can see, I claim a lot of memory for the job. My Question is: Does the failed pod free the claimed resources, as soon as it is on a failed state? Due to regulations, I have to keep pods for one week in the cluster, otherwise I would just set a very low ttlSecondsAfterFinished. I read a lot of contradicting stuff in articles, but found nothing in the official docs.
TDLR: Does a failed Pod free claimed resources of a cluster? If no, what is a good way to do it?
Yes, a failed or completed job will produce a container in Terminated state, and therefore the resources allocated to it are freed.
You can easily confirm this by using the command:
kubectl top pod
You should not see any pod associated with the failed job consuming resources.

GKE Cluster with 2 node pools doesn't scale up for one of the node pools

I have two node pools, one with GPU and one with CPU only, and run 2 types of jobs, both of which should spawn a node from their relevant node pool:
apiVersion: batch/v1
kind: Job
metadata:
generateName: cpu-job-
spec:
template:
spec:
containers:
- name: cpujob
image: gcr.io/asd
imagePullPolicy: Always
command: ["/bin/sh"]
args: ["-c", REDACTED]
resources:
requests:
memory: "16000Mi"
cpu: "8000m"
limits:
memory: "32000Mi"
cpu: "16000m"
restartPolicy: Never
nodeSelector:
cloud.google.com/gke-nodepool: cpujobs
backoffLimit: 4
apiVersion: batch/v1
kind: Job
metadata:
generateName: GPU-job-
spec:
template:
spec:
containers:
- name: gpu-job
image: gcr.io/fliermapper/agisoft-image:latest
imagePullPolicy: Always
command: ["/bin/sh"]
args: ["-c" REDACTED]
resources:
requests:
memory: "16000Mi"
cpu: "8000m"
limits:
memory: "32000Mi"
cpu: "16000m"
restartPolicy: Never
nodeSelector:
cloud.google.com/gke-nodepool: gpujobs
tolerations:
- key: nvidia.com/gpu
value: present
operator: Equal
backoffLimit: 4
It works fine for the gpujobs pool but for the CPU I get the following error
Warning FailedScheduling 4m40s (x1619 over 29h) default-scheduler 0/1 nodes are available: 1 node(s) didn'tmatch Pod's node affinity/selector.
Normal NotTriggerScaleUp 3m53s (x8788 over 29h) cluster-autoscaler pod didn't trigger scale-up: 1 node(s) didn't match Pod's node affinity/selector, 1 in backoff after failed scale-up
I have the nodeSelector defined for the CPU pool, so why does it not recognise the correct node pool and scale up, it says the Pod's node affinity/selector didn't match? I have created the node pools and they are available. Do I need to define tolerations or taints to make this work?

How often does kube-Scheduler refresh node resource data

I have a project to modify the scheduling policy, I have deployed a large number of pods at the same time, but it seems not scheduled as expected. I think kube-scheduler should cache the resource usage of nodes, so it needs to be deployed in two times.
Pod yaml is as follows, I run multiple pods through a shell loop implementation
apiVersion: v1
kind: Pod
metadata:
name: ${POD_NAME}
labels:
name: multischeduler-example
spec:
schedulerName: my-scheduler
containers:
- name: pod-with-second-annotation-container
image: ibmcom/pause:3.1
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "2Gi"
cpu: "2"
I want to know the interval of refreshing the cache of kube-scheduler for deployment
I really appreciate any help with this

How to enlarge memory limit for an existing pod

In openshift, how can I enlarge memory usage for an existing pod from 2GB to 16GB? As currently I always get run out of memory.
You can change the OOM limitation as "1." process, and lower OOM priority through "2." process.
Check if "resources.limits.memory" of your pod is configured sufficient size or not.
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: db
image: mysql
resources:
requests:
memory: "2Gi"
limits:
memory: "16Gi" <--- If this memory usage is reached by your application, triggered the Out of Memory event.
:
Configure the same size with "resources.requests.memory" and "resources.limits.memory" for lowest priority of the OOM.
Refer Quality of The services for more details.
// If limits and optionally requests are set (not equal to 0) for all resources and they are equal,
// then the container is classified as Guaranteed.
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: db
image: mysql
resources:
requests:
memory: "2Gi" <--- set the same size for memory
limits:
memory: "2Gi" <--- in requests and limits sections
:
Add this section to your deploymentConfig file.
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: Never
resources:
limits:
memory: "16Gi"
requests:
memory: "2Gi"
and if the problem persists then, I would suggest look for HPA(Horizontal pod autoscaler) which will increase pods based on cpu and memory utilization so that your application pod will never get killed. Check out this link for more info
https://docs.openshift.com/container-platform/3.11/dev_guide/pod_autoscaling.html
The most OOM Killed problem occurs for Java application so, setting env you can limit the application memory usage(which normally caused by heap memory usage) so you can limit those by just setting env veriable in your deploymentConfig under spec section.
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: Never
resources:
limits:
memory: "16Gi"
requests:
memory: "2Gi"
env:
- name: JVM_OPTS
value: "-Xms2048M -Xmx4048M"
or you could use this
env:
- name: JVM_OPTS
value: "-XX:+UnlockExperimentalVMOptions -XX:+UseCGroupMemoryLimitForHeap -XX:MaxRAMFraction=1
By using any of these env variable your application will respect the memory limit set on pod level/container level(means it will not go beyond it's limit and will do garbage cleanup as soon as hitting memory limit)
I hope this will solve your problem.