containerd - cannot update memory of a running container lower than its current memory - kubernetes

I am using 'crictl' tool to work with containerd runtime containers (under kubernetes) in a managed cluster.
I'm trying to set the memory limit (in bytes) to 16MB with the command:
crictl -r unix:///run/containerd/containerd.sock update --memory 16777216 c60df9ef3381e
And get the following error:
E1219 11:10:11.616194 1241 remote_runtime.go:640] "UpdateContainerResources from runtime service failed" err=<
rpc error: code = Unknown desc = failed to update resources: failed to update resources: /usr/bin/runc did not terminate successfully: exit status 1: unable to set memory limit to 16777216 (current usage: 97058816, peak usage: 126517248)
: unknown
> containerID="c60df9ef3381e"
FATA[0000] updating container resources for "c60df9ef3381e": rpc error: code = Unknown desc = failed to update resources: failed to update resources: /usr/bin/runc did not terminate successfully: exit status 1: unable to set memory limit to 16777216 (current usage: 97058816, peak usage: 126517248)
: unknown
At first I thought that maybe I cannot set a memory limit directly to a running container lower than the limit that appears in the kubernetes yaml.
Here Are the limits from K8s:
Requests:{"cpu":"100m","memory":"64Mi"} Limits:{"cpu":"200m","memory":"128Mi"}
But not, even setting a memory limit above the K8S request (e.g. 65MB) gives this same error!
This works on Docker runtime - I'm able to limit the memory of the container. Yes, it might crash, but the operation works..
Then, I tried to give a memory limit higher than the current usage, and it succeeded...
Can anyone help understanding this error and what might be causing it on containerd runtime?? Is this indeed a limitation that I cannot limit to a lower memory currently used by the container? Is there a way to overcome that?
Thanks a lot for your time!!!

Related

Cannot allocate GPU in Slurm

I've got a problem to allocate gpu resourese at Slurm cluster.
specify 1 GPU and run as shown below, it says that gres resources cannot be allocated. The same result If more than one.
$ srun --gres=gpu:1 --pty bash
srun: error: Unable to create step for job 73: Invalid generic resource (gres) specification
compute node's gres information seems to come out correctly as below
$ sinfo -o "%20N %10c %10m %25f %10G "
NODELIST CPUS MEMORY AVAIL_FEATURES GRES
gpu_svr[1-4 72 515484 (null) gpu:8
The Node configuration in the slurm.conf as below
/etc/slurm/slurm.conf
GresTypes=gpu
NodeName=gpu_svr1 NodeAddr=x.x.x.1 CPUs=72 RealMemory=515484 Sockets=2 CoresPerSocket=18
ThreadsPerCore=2 Gres=gpu:8 State=UNKNOWN
NodeName=gpu_svr2 NodeAddr=x.x.x.2 CPUs=72 RealMemory=515484 Sockets=2 CoresPerSocket=18
ThreadsPerCore=2 Gres=gpu:8 State=UNKNOWN
NodeName=gpu_svr3 NodeAddr=x.x.x.3 CPUs=72 RealMemory=515484 Sockets=2 CoresPerSocket=18
ThreadsPerCore=2 Gres=gpu:8 State=UNKNOWN
NodeName=gpu_svr4 NodeAddr=x.x.x.4 CPUs=72 RealMemory=515484 Sockets=2 CoresPerSocket=18
ThreadsPerCore=2 Gres=gpu:8 State=UNKNOWN
PartitionName=v100 Nodes=ALL Default=YES MaxTime=INFINITE State=UP
here is gres.conf on Compute nodes
gres.conf
NodeName=gpu_svr[1-4] Name=gpu File=/dev/nvidia[0-7]
Solved.
The following options should be stated in the slurm.conf
SelectType=select/cons_tres
SelectTypeParameters=CR_Core
JobAcctGatherType=jobacct_gather/cgroup

Enable custom kubernetes scheduler for a namespace

I have a k8 job that brings up multiple pods. This job is used for load testing so all the pods need to come up at the same time. Job shouldn't be started until nodes are available for all pods to be scheduled.
I came across kube-batch https://github.com/kubernetes-sigs/kube-batch to do this scheduling. I have couple of questions:
1. How to enable kube-batch for only one namespace in a cluster?
2. Installed kube-batch by following the tutorial. But pods are failing on startup with below error. How to resolve this error?
I1204 20:07:55.911393 1 allocate.go:96] Queue <default> is overused, ignore it.
I1204 20:07:55.911399 1 allocate.go:194] Leaving Allocate ...
I1204 20:07:55.911407 1 backfill.go:41] Enter Backfill ...
I1204 20:07:55.911413 1 backfill.go:71] Leaving Backfill ...
E1204 20:07:55.911521 1 runtime.go:69] Observed a panic: "invalid memory address or nil pointer dereference" (runtime error: invalid memory address or nil pointer dereference)
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:76
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:65
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:51
/usr/local/go/src/runtime/asm_amd64.s:522
/usr/local/go/src/runtime/panic.go:513
/usr/local/go/src/runtime/panic.go:82
/usr/local/go/src/runtime/signal_unix.go:390
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/framework/session.go:368
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/plugins/gang/gang.go:154
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/framework/framework.go:58
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/scheduler.go:102
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/pkg/scheduler/scheduler.go:85
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:134
/home/root1/servicecomb/go/src/github.com/kubernetes-sigs/kube-batch/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:88
/usr/local/go/src/runtime/asm_amd64.s:1333
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x148 pc=0x10ab979]
Not sure what you are trying to achive is doable. In my opinion what you can do is to modify the pods dockerfile to include Supervisord . Then in supervisord specify the commands you want to run when the pods come in running state using priority for supervisord.
Example
[program:api]
directory=/usr/local
command=go main.go
priority=100
autostart=true
autorestart=true
stderr_logfile=/var/log
stdout_logfile=/var/log

haproxy ulimit-n computation

I got a haproxy 1.8 vanilla alpine docker image running with maxconn = 2000
curl -s http://host:port/stats| grep maxsock
<b>maxsock = </b> 4017; <b>maxconn = </b> 2000; <b>maxpipes = </b> 0<br>
Sometimes I get the following Warning in my logs:
[WARNING] 0/0 (0) : [/usr/local/sbin/haproxy.main()] FD limit (4015) too low for maxconn=2000/maxsock=4016. Please raise 'ulimit-n' to 4016 or more to avoid any trouble.
I find it very odd since I read this in haproxy doc:
ulimit-n
Sets the maximum number of per-process file-descriptors to . By
default, it is automatically computed, so it is recommended not to use this
option.
Not sure if it's a bug on haproxy or something I am doing wrong.
What do you think of that?
edit: haproxy is running as root
It depends on the open file descriptor limit(hard and soft), you can check that by ulimit -Hn and ulimit -Sn.
It is automatically computed, but it depends on the user you run haproxy, if you run haproxy using root then even if the computed value is greater than hard limit, you can set that value without warning.
But if you run as a normal user, then the max value is hard limit, if the computed value is greater than that, you got the warning.

Slow server caused by mongodb instance

I see the MongoDB usage extremely high. It shows it is using 756% of the cpu and the load is at 4
22527 root 20 0 0.232t 0.024t 0.023t S 756.2 19.5 240695:16 mongod
I checked the MongoDB logs and found that every question is taking more than 200ms to execute which is causing the high resource usage and speed issue .

Mappers fail for pig to insert data into MongoDB

I am trying to import a file from HDFS to MongoDB using MongoInsertStorage with PIG. The files are large, around 5GB. The script runs fine when I run it in local mode with
pig -x local example.pig
However if I run it in the mapreduce mode, Most of the mappers fail with the following error:
Error: com.mongodb.ConnectionString.getReadConcern()Lcom/mongodb/ReadConcern;
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
Can someone help me solve this issue?? I also increased the memory allocated to YARN containers but that hasnt helped.
Some mappers are also timing out after 300 seconds.
Pig Script is as follows
REGISTER mongo-java-driver-3.2.2.jar
REGISTER mongo-hadoop-core-1.4.0.jar
REGISTER mongo-hadoop-pig-1.4.0.jar
REGISTER mongodb-driver-3.2.2.jar
DEFINE MongoInsertStorage com.mongodb.hadoop.pig.MongoInsertStorage();
SET mapreduce.reduce.speculative true
BIG_DATA = LOAD 'hdfs://example.com:8020/user/someuser/sample.csv' using PigStorage(',') As (a:chararray,b:chararray,c:chararray);
STORE BIG_DATA INTO 'mongodb://insert.some.ip.here:27017/test.samplecollection' USING MongoInsertStorage('', '')
Found a solution.
For the error
Error: com.mongodb.ConnectionString.getReadConcern()Lcom/mongodb/ReadConcern;
Container killed by the ApplicationMaster.
Container killed on request.
Exit code is 143 Container exited with a non-zero exit code 143
I changed the JAR versions - hadoopcore and hadooppig from 1.4.0 to 2.0.2 and for Mongo Java driver from 3.2.2 to 3.4.2. This eliminated the ReadConcern Error on the mappers!
For the timeout, I added this after registering the jars:
SET mapreduce.task.timeout 1800000
I had been using SET mapred.task.timeout which didnt work
Hope this helps anyone who has a similar issue!