How to set BBR on GKE clusters?

How to set BBR on GKE clusters? - kubernetes

What is the best way to enable BBR on default for my clusters?
In this link, I didn't see an option for controlling the congestion control.

Google BBR can only be enabled in Linux operating systems. By default the Linux servers uses Reno and CUBIC but the latest version kernels also includes the google BBR algorithms and can be enabled manually.
To enable it on CentOS 8 add below lines in /etc/sysctl.conf and issue command sysctl -p
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr
For more Linux distributions you can refer to this link.

Maybe set it as part of your Deployment spec:
spec:
initContainers:
- name: sysctl-buddy
image: busybox:1.29
securityContext:
privileged: true
command: ["/bin/sh"]
args:
- -c
- sysctl -w net.core.default_qdisc=fq net.ipv4.tcp_congestion_control=bbr
resources:
requests:
cpu: 1m
memory: 1Mi

Related

Dask Helm Chart - How to create Dask-CUDA-Worker nodes

I installed the Dask Helm Chart with a revised values.yaml to have 10 workers, however instead of Dask Workers I want to create Dash CUDA Workers to take advantage of the NVIDIA GPUs on my multi-node cluster.
I tried to modify the values.yaml as follows to get Dask CUDA workers instead of Dask Workers, but the worker pods are able to start. I did already install the NVIDIA GPUs on all my nodes on the Kubernetes per the official instructions so I'm not sure what DASK needs to see in order to create the Dask-Cuda-Workers.
worker:
name: worker
image:
repository: "daskdev/dask"
tag: 2.19.0
pullPolicy: IfNotPresent
dask_worker: "dask-cuda-worker"
#dask_worker: "dask-worker"
pullSecrets:
# - name: regcred
replicas: 15
default_resources: # overwritten by resource limits if they exist
cpu: 1
memory: "4GiB"
env:
# - name: EXTRA_APT_PACKAGES
# value: build-essential openssl
# - name: EXTRA_CONDA_PACKAGES
# value: numba xarray -c conda-forge
# - name: EXTRA_PIP_PACKAGES
# value: s3fs dask-ml --upgrade
resources: {}
# limits:
# cpu: 1
# memory: 3G
# nvidia.com/gpu: 1
# requests:
# cpu: 1
# memory: 3G
# nvidia.com/gpu: 1

As dask-cuda-worker is not yet officially in the dask image you will need to pull the image a different image: rapidsai/rapidsai:latest

how to make chown command worked in nfs share folder

I am make a nfs file share and using it in kubernetes pods, but when I start pods, it give me tips :
2020-05-31 03:00:06+00:00 [Note] [Entrypoint]: Entrypoint script for MySQL Server 5.7.30-1debian10 started.
chown: changing ownership of '/var/lib/mysql/': Operation not permitted
I searching from internet and understand the nfs default map other root login to nfsnobody account, if the privillege not correct, this error should happen, but I follow the steps and still not solve it. This is the ways I having tried:
1 addd unsecure config no_root_squash in /etc/exports:
/mnt/data/apollodb/apollopv *(rw,sync,no_subtree_check,no_root_squash)
2 remove the PVC and PV and directly using nfs in pod like this:
volumes:
- name: apollo-mysql-persistent-storage
nfs:
server: 192.168.64.237
path: /mnt/data/apollodb/apollopv
containers:
- name: mysql
image: 'mysql:5.7'
ports:
- name: mysql
containerPort: 3306
protocol: TCP
env:
- name: MYSQL_ROOT_PASSWORD
value: gfwge4LucnXwfefewegLwAd29QqJn4
resources: {}
volumeMounts:
- name: apollo-mysql-persistent-storage
mountPath: /var/lib/mysql
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
restartPolicy: Always
terminationGracePeriodSeconds: 30
dnsPolicy: ClusterFirst
securityContext: {}
schedulerName: default-scheduler
this tell me the problem not in pod define but in the nfs config itself.
3 give every privillege using this command
chmod 777 /mnt/data/apollodb/apollopv
4 chown to nfsnobody like this
sudo chown nfsnobody:nfsnobody -R apollodb/
sudo chown 999:999 -R apollodb
but the problem still not solved,so what should I try to make it works?

You wouldn't set this via chown, you would use fsGroup security setting instead.

GPU resource limit

I’m having some trouble limiting my Pods access to the GPUs available on my cluster.
Here is my .yaml:
kind: Pod
metadata:
name: train-gpu
spec:
containers:
- name: train-gpu
image: index.docker.io/myprivaterepository/train:latest
command: ["sleep"]
args: ["100000"]
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
When I run the nvidia-smi command inside of this pod all of the GPUs show up, rather than just 1.
Any advice would be much appreciated.
Some information that may be useful:
Kubernetes version:
Client Version: version.Info{Major:“1”, Minor:“16”, GitVersion:“v1.16.1”, GitCommit:“d647ddbd755faf07169599a625faf302ffc34458”, GitTreeState:“clean”, BuildDate:“2019-10-07T14:30:40Z”, GoVersion:“go1.12.10”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: version.Info{Major:“1”, Minor:“15”, GitVersion:“v1.15.3”, GitCommit:“2d3c76f9091b6bec110a5e63777c332469e0cba2”, GitTreeState:“clean”, BuildDate:“2019-08-19T11:05:50Z”, GoVersion:“go1.12.9”, Compiler:“gc”, Platform:“linux/amd64”}
Docker base image:
FROM nvidia/cuda:10.1-base-ubuntu18.04

Not able to start apache-nifi in aks

Hi all I am working on Nifi and I am trying to install it in AKS (Azure kubernetes service).
Using nifi 1.9.2 version. While installing it in AKS gives me an error
replacing target file /opt/nifi/nifi-current/conf/nifi.properties
sed: preserving permissions for ‘/opt/nifi/nifi-current/conf/sedSFiVwC’: Operation not permitted
replacing target file /opt/nifi/nifi-current/conf/nifi.properties
sed: preserving permissions for ‘/opt/nifi/nifi-current/conf/sedK3S1JJ’: Operation not permitted
replacing target file /opt/nifi/nifi-current/conf/nifi.properties
sed: preserving permissions for ‘/opt/nifi/nifi-current/conf/sedbcm91T’: Operation not permitted
replacing target file /opt/nifi/nifi-current/conf/nifi.properties
sed: preserving permissions for ‘/opt/nifi/nifi-current/conf/sedIuYSe1’: Operation not permitted
NiFi running with PID 28.
The specified run.as user nifi
does not exist. Exiting.
Received trapped signal, beginning shutdown...
Below is my nifi.yml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: nifi-core
spec:
replicas: 1
selector:
matchLabels:
app: nifi-core
template:
metadata:
labels:
app: nifi-core
spec:
containers:
- name: nifi-core
image: my-azurecr.io/nifi-core-prod:1.9.2
env:
- name: NIFI_WEB_HTTP_PORT
value: "8080"
- name: NIFI_VARIABLE_REGISTRY_PROPERTIES
value: "./conf/custom.properties"
resources:
requests:
cpu: "6"
memory: 12Gi
limits:
cpu: "6"
memory: 12Gi
ports:
- containerPort: 8080
volumeMounts:
- name: my-nifi-core-conf
mountPath: /opt/nifi/nifi-current/conf
volumes:
- name: my-nifi-core-conf
azureFile:
shareName: my-file-nifi-core/nifi/conf
secretName: my-nifi-secret
readOnly: false
I have some customization in nifi Dockerfile, which copies some config files related to my configuration. When I ran my-azurecr.io/nifi-core-prod:1.9.2 docker image on my local it works as expected
But when I try to run it on AKS its giving above error. since its related to permissions I have tried with both user nifi and root in Dockerfile.
All the required configuration files are provided in volume my-nifi-core-conf running in same resourse group.
Since I am starting nifi with docker my exception is, it will behave same regardless of environment. Either on my local or in AKS.
But error also say user nifi does not exist. The official nifi-image setup the user requirement.
Can anyone help, I cant event start container in interaction mode as pods in not in running mode. Thanks in advance.

I think your missing the Security Context definition for your Kubernetes Pod. The user that Nifi runs under within a Docker has a specific UID and GID, and with the error message you getting, I would suspect that because that user is not defined in the Pod's security context it's not launching as expected.
Have a look at section on the Kubernetes documentation about security contexts, and that should be enough get you started.
I would also have a look at using something like Minikube when testing Kubernetes deployments as Kubernetes adds a large number of controls around a container engine like Docker.
Security Contexts Docs: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
Minikube: https://kubernetes.io/docs/setup/learning-environment/minikube/

If you never figured this out, I was able to do this by running an initContainer before the main container, and changing the directory perms there.
initContainers:
- name: init1
image: busybox:1.28
volumeMounts:
- name: nifi-pvc
mountPath: "/opt/nifi/nifi-current"
command: ["sh", "-c", "chown -R 1000:1000 /opt/nifi/nifi-current"] #or whatever you want to do as root

update: does not work with nifi 1.14.0 - works with 1.13.2

Error in running DPDK L2FWD application on a container managed by Kubernetes

I am trying to run DPDK L2FWD application on a container managed by Kubernetes.
To achieve this I have done the below steps -
I have created single node K8s setup where both master and client are running on host machine. As network plug-in, I have used Calico Network.
To create customized DPDK docker image, I have used the below Dockerfile
FROM ubuntu:16.04 RUN apt-get update
RUN apt-get install -y net-tools
RUN apt-get install -y python
RUN apt-get install -y kmod
RUN apt-get install -y iproute2
RUN apt-get install -y net-tools
ADD ./dpdk/ /home/sdn/dpdk/
WORKDIR /home/sdn/dpdk/
To run DPDK application inside POD, below host's directories are mounted to POD with privileged access:
/mnt/huge
/usr
/lib
/etc
Below is k8s deployment yaml used to create the POD
apiVersion: v1
kind: Pod
metadata:
name: dpdk-pod126
spec:
containers:
- name: dpdk126
image: dpdk-test126
imagePullPolicy: IfNotPresent
command: ["/bin/sh"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
resources:
requests:
memory: "2Gi"
cpu: "100m"
volumeMounts:
- name: hostvol1
mountPath: /mnt/huge
- name: hostvol2
mountPath: /usr
- name: hostvol3
mountPath: /lib
- name: hostvol4
mountPath: /etc
securityContext:
privileged: true
volumes:
- name: hostvol1
hostPath:
path: /mnt/huge
- name: hostvol2
hostPath:
path: /usr
- name: hostvol3
hostPath:
path: /home/sdn/kubernetes-test/libtest
- name: hostvol4
hostPath:
path: /etc
Below configurations are already done in host -
Huge page mounting.
Interface binding in user space.
After successful creation of POD, when trying to run a DPDK L2FWD application inside POD, I am getting the below error -
root#dpdk-pod126:/home/sdn/dpdk# ./examples/l2fwd/build/l2fwd -c 0x0f -- -p 0x03 -q 1
EAL: Detected 16 lcore(s)
EAL: Detected 1 NUMA nodes
EAL: Multi-process socket /var/run/dpdk/rte/mp_socket
EAL: No free hugepages reported in hugepages-1048576kB
EAL: 1007 hugepages of size 2097152 reserved, but no mounted hugetlbfs found for that size
EAL: FATAL: Cannot get hugepage information.
EAL: Cannot get hugepage information.
EAL: Error - exiting with code: 1
Cause: Invalid EAL arguments

According to this, you might be missing
medium: HugePages from your hugepage volume.
Also, hugepages can be a bit finnicky. Can you provide the output of:
cat /proc/meminfo | grep -i huge
and check if there's any files in /mnt/huge?
Also maybe this can be helpful. Can you somehow check if the hugepages are being mounted as mount -t hugetlbfs nodev /mnt/huge?

First of all, you have to verify, that you have enough hugepages in your system. Check it with kubectl command:
kubectl describe nodes
where you could see something like this:
Capacity:
cpu: 12
ephemeral-storage: 129719908Ki
hugepages-1Gi: 0
hugepages-2Mi: 8Gi
memory: 65863024Ki
pods: 110
If your hugepages-2Mi is empty, then your k8s don't see mounted hugepages
After mounting hugepages into your host, you can prepare your pod to work with hugepages. You don't need to mount hugepages folder as you shown. You can simply add emptyDir volume like this:
volumes:
- name: hugepage-2mi
emptyDir:
medium: HugePages-2Mi
HugePages-2Mi is a specific resource name that corresponds with hugepages of 2Mb size. If you want to use 1Gb size hugepages then there is another resource for it - hugepages-1Gi
After defining the volume, you can use it in volumeMounts like this:
volumeMounts:
- mountPath: /hugepages-2Mi
name: hugepage-2mi
And there is one additional step. You have to define resource limitations for hugepages usage:
resources:
limits:
hugepages-2Mi: 128Mi
memory: 128Mi
requests:
memory: 128Mi
After all this steps, you can run your container with hugepages inside container
As #AdamTL mentioned, you can find additional info here

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

How to set BBR on GKE clusters? - kubernetes

What is the best way to enable BBR on default for my clusters? In this link, I didn't see an option for controlling the congestion control.

Maybe set it as part of your Deployment spec: spec: initContainers: - name: sysctl-buddy image: busybox:1.29 securityContext: privileged: true command: ["/bin/sh"] args: - -c - sysctl -w net.core.default_qdisc=fq net.ipv4.tcp_congestion_control=bbr resources: requests: cpu: 1m memory: 1Mi

Related

Dask Helm Chart - How to create Dask-CUDA-Worker nodes

how to make chown command worked in nfs share folder

GPU resource limit

Not able to start apache-nifi in aks

Error in running DPDK L2FWD application on a container managed by Kubernetes

Categories

Resources