GPU resource limit - kubernetes

I’m having some trouble limiting my Pods access to the GPUs available on my cluster.
Here is my .yaml:
kind: Pod
metadata:
name: train-gpu
spec:
containers:
- name: train-gpu
image: index.docker.io/myprivaterepository/train:latest
command: ["sleep"]
args: ["100000"]
resources:
limits:
nvidia.com/gpu: 1 # requesting 1 GPU
When I run the nvidia-smi command inside of this pod all of the GPUs show up, rather than just 1.
Any advice would be much appreciated.
Some information that may be useful:
Kubernetes version:
Client Version: version.Info{Major:“1”, Minor:“16”, GitVersion:“v1.16.1”, GitCommit:“d647ddbd755faf07169599a625faf302ffc34458”, GitTreeState:“clean”, BuildDate:“2019-10-07T14:30:40Z”, GoVersion:“go1.12.10”, Compiler:“gc”, Platform:“linux/amd64”}
Server Version: version.Info{Major:“1”, Minor:“15”, GitVersion:“v1.15.3”, GitCommit:“2d3c76f9091b6bec110a5e63777c332469e0cba2”, GitTreeState:“clean”, BuildDate:“2019-08-19T11:05:50Z”, GoVersion:“go1.12.9”, Compiler:“gc”, Platform:“linux/amd64”}
Docker base image:
FROM nvidia/cuda:10.1-base-ubuntu18.04

Related

Getting JAR file from S3 using Flink Kubernetes operator

I'm experimenting with the new Flink Kubernetes operator and I've been able to do pretty much everything that I need besides one thing: getting a JAR file from the S3 file system.
Context
I have a Flink application running in a EKS cluster in AWS and have all the information saved in a S3 buckets. Things like savepoints, checkpoints, high availability and JARs files are all stored there.
I've been able to save the savepoints, checkpoints and high availability information in the bucket, but when trying to get the JAR file from the same bucket I get the error:
Could not find a file system implementation for scheme 's3'. The scheme is directly supported by Flink through the following plugins: flink-s3-fs-hadoop, flink-s3-fs-presto.
I was able to get to this thread, but I wasn't able to get the resource fetcher to work correctly. Also the solution is not ideal and I was searching for a more direct approach.
Deployment files
Here's the files that I'm deploying in the cluster:
deployment.yml
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-deployment
spec:
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
containers:
- name: flink-main-container
env:
- name: ENABLE_BUILT_IN_PLUGINS
value: flink-s3-fs-presto-1.15.3.jar;flink-s3-fs-hadoop-1.15.3.jar
volumeMounts:
- mountPath: /flink-data
name: flink-volume
volumes:
- name: flink-volume
hostPath:
path: /tmp
type: Directory
image: flink:1.15
flinkVersion: v1_15
flinkConfiguration:
state.checkpoints.dir: s3://kubernetes-operator/checkpoints
state.savepoints.dir: s3://kubernetes-operator/savepoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3://kubernetes-operator/ha
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
serviceAccount: flink
session-job.yml
apiVersion: flink.apache.org/v1beta1
kind: FlinkSessionJob
metadata:
name: flink-session-job
spec:
deploymentName: flink-deployment
job:
jarURI: s3://kubernetes-operator/savepoints/flink.jar
parallelism: 3
upgradeMode: savepoint
savepointTriggerNonce: 0
The Flink Kubernetes operator version that I'm using is 1.3.1
Is there anything that I'm missing or doing wrong?
The download of the jar happens in flink-kubernetes-operator pod. So, when you apply FlinkSessionJob, the fink-operator would recognize the Crd and will try to download the jar from jarUri location and construct a JobGraph and submit the sessionJob to JobDeployment. Flink Kubernetes Operator will also have flink running inside it to build a JobGraph.
So, You will have to add flink-s3-fs-hadoop-1.15.3.jar in location /opt/flink/plugins/s3-fs-hadoop/ inside flink-kubernetes-operator
You can add the jar either by extending the ghcr.io/apache/flink-kubernetes-operator image, curl the jar and copy it to plugins location
or
You can write an initContainer which will download the jar to a volume and mount that volume
volumes:
- name: s3-plugin
emptyDir: { }
initContainers:
- name: busybox
image: busybox:latest
volumeMounts:
- mountPath: /opt/flink/plugins/s3-fs-hadoop
name: s3-plugin
containers:
- image: 'ghcr.io/apache/flink-kubernetes-operator:95128bf'
name: flink-kubernetes-operator
volumeMounts:
- mountPath: /opt/flink/plugins/s3-fs-hadoop
name: s3-plugin
Also, if you are using serviceAccount for S3 authentication, give below config in flinkConfig
fs.s3a.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider

The active users count dont match on execution through 2 containers in Kubernates

We are running jmx through Tauras using 2 containers in Kubernetes.
We are seeing only 50 users in results instead of 100(50*2 containers).
Can anyone please through some light if we are missing something here.
We get two jtl and checking them individual or combined the total users are same 50 only. Is it related to same Thread name being generated and logged in jtl file or something else.
Here is the yml details:
apiVersion: v1
kind: ConfigMap
metadata:
name: joba
namespace: AAA
data:
protocol: "https"
serverUrl: “testurl”
users: "50”
duration: "1m"
nodeName: "Nodename"
---
apiVersion: /v1
kind: Job
metadata:
name: perftest
namespace: dev
spec:
template:
spec:
containers:
- args: ["split -l ${users} --numeric-suffixes Test.csv Test-; /bin/bash ./Shellscripttoread_assignvariables.sh;"]
command: ["/bin/bash", "-c"]
env:
- name: JobNumber
value: "00"
envFrom:
- configMapRef:
name: job-multi
image: imagepath
name: ubuntu-00
resources:
limits:
memory: “8000Mi"
cpu: "2880m"
- args: ["split -l ${users} --numeric-suffixes Test.csv Test-; /bin/bash ./Shellscripttoread_assignvariables.sh;"]
command: ["/bin/bash", "-c"]
env:
- name: JobNumber
value: "01”
envFrom:
- configMapRef:
name: job-multi
image: imagepath
name: ubuntu-01
resources:
limits:
memory: “8000Mi"
cpu: "2880m"
Your YAML is very nice but it doesn't tell anything about how do you launch JMeter or what these shell scripts you invoke are doing.
If you just kick off 2 separate JMeter instances by means of k8s - JMeter will look at the number of active threads from the .jtl file and given the Sampler/Transaction names are the same JMeter "thinks" that the tests were executed on one engine.
The workaround is to add i.e. machineName() or __machineIP() function to sampler/transaction labels, this way JMeter will distinguish the results coming from different instances and you will see real number of active threads.
The solution would be running your JMeter test in Distributed Mode so master will run in one pod, slaves in their own pods and the master will be responsible for transferring .jmx script to the slaves and collecting results from them

Not able to start apache-nifi in aks

Hi all I am working on Nifi and I am trying to install it in AKS (Azure kubernetes service).
Using nifi 1.9.2 version. While installing it in AKS gives me an error
replacing target file /opt/nifi/nifi-current/conf/nifi.properties
sed: preserving permissions for ‘/opt/nifi/nifi-current/conf/sedSFiVwC’: Operation not permitted
replacing target file /opt/nifi/nifi-current/conf/nifi.properties
sed: preserving permissions for ‘/opt/nifi/nifi-current/conf/sedK3S1JJ’: Operation not permitted
replacing target file /opt/nifi/nifi-current/conf/nifi.properties
sed: preserving permissions for ‘/opt/nifi/nifi-current/conf/sedbcm91T’: Operation not permitted
replacing target file /opt/nifi/nifi-current/conf/nifi.properties
sed: preserving permissions for ‘/opt/nifi/nifi-current/conf/sedIuYSe1’: Operation not permitted
NiFi running with PID 28.
The specified run.as user nifi
does not exist. Exiting.
Received trapped signal, beginning shutdown...
Below is my nifi.yml file
apiVersion: apps/v1
kind: Deployment
metadata:
name: nifi-core
spec:
replicas: 1
selector:
matchLabels:
app: nifi-core
template:
metadata:
labels:
app: nifi-core
spec:
containers:
- name: nifi-core
image: my-azurecr.io/nifi-core-prod:1.9.2
env:
- name: NIFI_WEB_HTTP_PORT
value: "8080"
- name: NIFI_VARIABLE_REGISTRY_PROPERTIES
value: "./conf/custom.properties"
resources:
requests:
cpu: "6"
memory: 12Gi
limits:
cpu: "6"
memory: 12Gi
ports:
- containerPort: 8080
volumeMounts:
- name: my-nifi-core-conf
mountPath: /opt/nifi/nifi-current/conf
volumes:
- name: my-nifi-core-conf
azureFile:
shareName: my-file-nifi-core/nifi/conf
secretName: my-nifi-secret
readOnly: false
I have some customization in nifi Dockerfile, which copies some config files related to my configuration. When I ran my-azurecr.io/nifi-core-prod:1.9.2 docker image on my local it works as expected
But when I try to run it on AKS its giving above error. since its related to permissions I have tried with both user nifi and root in Dockerfile.
All the required configuration files are provided in volume my-nifi-core-conf running in same resourse group.
Since I am starting nifi with docker my exception is, it will behave same regardless of environment. Either on my local or in AKS.
But error also say user nifi does not exist. The official nifi-image setup the user requirement.
Can anyone help, I cant event start container in interaction mode as pods in not in running mode. Thanks in advance.
I think your missing the Security Context definition for your Kubernetes Pod. The user that Nifi runs under within a Docker has a specific UID and GID, and with the error message you getting, I would suspect that because that user is not defined in the Pod's security context it's not launching as expected.
Have a look at section on the Kubernetes documentation about security contexts, and that should be enough get you started.
I would also have a look at using something like Minikube when testing Kubernetes deployments as Kubernetes adds a large number of controls around a container engine like Docker.
Security Contexts Docs: https://kubernetes.io/docs/tasks/configure-pod-container/security-context/
Minikube: https://kubernetes.io/docs/setup/learning-environment/minikube/
If you never figured this out, I was able to do this by running an initContainer before the main container, and changing the directory perms there.
initContainers:
- name: init1
image: busybox:1.28
volumeMounts:
- name: nifi-pvc
mountPath: "/opt/nifi/nifi-current"
command: ["sh", "-c", "chown -R 1000:1000 /opt/nifi/nifi-current"] #or whatever you want to do as root
update: does not work with nifi 1.14.0 - works with 1.13.2

GKE node with modprobe

Is there a way to load any kernel module ("modprobe nfsd" in my case) automatically after starting/upgrading nodes or in GKE? We are running an NFS server pod on our kubernetes cluster and it dies after every GKE upgrade
Tried both cos and ubuntu images, none of them seems to have nfsd loaded by default.
Also tried something like this, but it seems it does not do what it is supposed to do:
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: nfsd-modprobe
labels:
app: nfsd-modprobe
spec:
template:
metadata:
labels:
app: nfsd-modprobe
spec:
hostPID: true
containers:
- name: nfsd-modprobe
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: Always
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
modprobe nfs
modprobe nfsd
while true; do sleep 1; done
I faced the same issue, existing answer is correct, I want to expand it with working example of nfs pod within kubernetes cluster which has capabilities and libraries to load required modules.
It has two important parts:
privileged mode
mounted /lib/modules directory within the container to use it
nfs-server.yaml
kind: Pod
apiVersion: v1
metadata:
name: nfs-server-pod
spec:
containers:
- name: nfs-server-container
image: erichough/nfs-server
securityContext:
privileged: true
env:
- name: NFS_EXPORT_0
value: "/test *(rw,no_subtree_check,insecure,fsid=0)"
volumeMounts:
- mountPath: /lib/modules # mounting modules into container
name: lib-modules
readOnly: true # make sure it's readonly
- mountPath: /test
name: export-dir
volumes:
- hostPath: # using hostpath to get modules from the host
path: /lib/modules
type: Directory
name: lib-modules
- name: export-dir
emptyDir: {}
Reference which helped as well - Automatically load required kernel modules.
By default, you cannot load modules from inside a container because excluding kernel components is one of the main reason containers are lightweight and portable. You need to load the module from the host OS in order to make it available inside the container. This means you could simply launch a script that enables the kernel modules you want after each GKE upgrade.
However, there exists a somewhat hacky way to load kernel modules from inside a docker container. It all boils down to launching your container with escalated privileges and with access to certain host directories. You should try that if you really want to mount your kernel modules while inside a container.

What is the equivalent for depends_on in kubernetes

I have a docker compose file with the following entries
version: '2.1'
services:
mysql:
container_name: mysql
image: mysql:latest
volumes:
- ./mysqldata:/var/lib/mysql
environment:
MYSQL_ROOT_PASSWORD: 'password'
ports:
- '3306:3306'
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:3306"]
interval: 30s
timeout: 10s
retries: 5
test1:
container_name: test1
image: test1:latest
ports:
- '4884:4884'
- '8443'
depends_on:
mysql:
condition: service_healthy
links:
- mysql
The Test-1 container is dependent on mysql and it needs to be up and running.
In docker this can be controlled using health check and depends_on attributes.
The health check equivalent in kubernetes is readinessprobe which i have already created but how do we control the container startup in the pod's?????
Any directions on this is greatly appreciated.
My Kubernetes file:
apiVersion: apps/v1beta1
kind: Deployment
metadata:
name: deployment
spec:
replicas: 1
template:
metadata:
labels:
app: deployment
spec:
containers:
- name: mysqldb
image: "dockerregistry:mysqldatabase"
imagePullPolicy: Always
ports:
- containerPort: 3306
readinessProbe:
tcpSocket:
port: 3306
initialDelaySeconds: 15
periodSeconds: 10
- name: test1
image: "dockerregistry::test1"
imagePullPolicy: Always
ports:
- containerPort: 3000
That's the beauty of Docker Compose and Docker Swarm... Their simplicity.
We came across this same Kubernetes shortcoming when deploying the ELK stack.
We solved it by using a side-car (initContainer), which is just another container in the same pod thats run first, and when it's complete, kubernetes automatically starts the [main] container. We made it a simple shell script that is in loop until Elasticsearch is up and running, then it exits and Kibana's container starts.
Below is an example of a side-car that waits until Grafana is ready.
Add this 'initContainer' block just above your other containers in the Pod:
spec:
initContainers:
- name: wait-for-grafana
image: darthcabs/tiny-tools:1
args:
- /bin/bash
- -c
- >
set -x;
while [[ "$(curl -s -o /dev/null -w ''%{http_code}'' http://grafana:3000/login)" != "200" ]]; do
echo '.'
sleep 15;
done
containers:
.
.
(your other containers)
.
.
This was purposefully left out. The reason being is that applications should be responsible for their connect/re-connect logic for connecting to service(s) such as a database. This is outside the scope of Kubernetes.
While I don't know the direct answer to your question except this link (k8s-AppController), I don't think it's wise to use same deployment for DB and app. Because you are tightly coupling your db with app and loosing awesome k8s option to scale any one of them as needed. Further more if your db pod dies you loose your data as well.
Personally what I would do is to have a separate StatefulSet with Persistent Volume for database and Deployment for app and use Service to make sure their communication.
Yes I have to run few different commands and may need at least two separate deployment files but this way I am decoupling them and can scale them as needed. And my data is being persistent as well!
As mentioned, you should run the database and the application containers in separate pods and connect them with a service.
Unfortunately, both Kubernetes and Helm don't provide a functionality similar to what you've described. We had many issues with that and tried a few approaches until we have decided to develop a smallish utility that solved this problem for us.
Here's the link to the tool we've developed: https://github.com/Opsfleet/depends-on
You can make pods wait until other pods become ready according to their readinessProbe configuration. It's very close to Docker's depends_on functionality.
In Kubernetes terminology one your docker-compose set is a Pod.
So, there is no depends_on equivalent there. Kubernetes will check all containers in a pod and they all have to be alive for a mark that pod as Healthy and will always run them together.
In your case, you need to prepare configuration of Deployment like that:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
name: my-app
spec:
replicas: 1
template:
metadata:
labels:
app: app-and-db
spec:
containers:
- name: app
image: nginx
ports:
- containerPort: 80
- name: db
image: mysql
ports:
- containerPort: 3306
After pod will be started, your database will be available on localhost interface for your application, because of network conception:
Containers within a pod share an IP address and port space, and can find each other via localhost. They can also communicate with each other using standard inter-process communications like SystemV semaphores or POSIX shared memory.
But, as #leninhasda mentioned, it is not a good idea to run database and application in your pod and without Persistent Volume. Here is a good tutorial on how to run a stateful application in the Kubernetes.
https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
what about liveness and readiness ??? supports commands, http requests and more
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness-exec
spec:
containers:
- name: liveness
image: k8s.gcr.io/busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5