cloudinitnocloud userdata is not working in k8s - kubernetes

My yaml file likes :
apiVersion: kubevirt.io/v1
kind: VirtualMachineInstance
metadata:
name: m1
spec:
domain:
cpu:
cores: 4
devices:
disks:
- name: harddrive
disk: {}
- name: cloudinitdisk
disk: {}
interfaces:
- name: ovs-net
bridge: {}
- name: default
masquerade: {}
resources:
requests:
memory: 8G
volumes:
- name: harddrive
containerDisk:
image: 1.1.1.1:8888/redhat/redhat79:latest
- name: cloudinitdisk
cloudInitNoCloud:
userData: |
#!/bin/bash
echo 1 > /opt/1.txt
networks:
- name: ovs-net
multus:
networkName: ovs-vlan-100
- name: default
pod: {}
VMI is running and I login the vm , nothing is in directory '/opt'; I find a disk sdb ,I mount sdb to /mnt, I can see file 'userdata', and the content in 'userdata' is right
I don't know where I did wrong
K8S 1.22.10
I also tried the other two methods
1)
cloudInitNoCloud:
userData: |
bootcmd:
- touch /opt/1.txt
runcmd:
- touch /opt/2.txt
cloudInitNoCloud:
secretRef:
name: my-vmi-secret
I hope the cloudinitnocloud work, and it can run my command

I find the problem, the docker image that I used doesn't install cloud* package
Kubevirt offical doesn't mention it, I think I can use it directly.

Related

Getting JAR file from S3 using Flink Kubernetes operator

I'm experimenting with the new Flink Kubernetes operator and I've been able to do pretty much everything that I need besides one thing: getting a JAR file from the S3 file system.
Context
I have a Flink application running in a EKS cluster in AWS and have all the information saved in a S3 buckets. Things like savepoints, checkpoints, high availability and JARs files are all stored there.
I've been able to save the savepoints, checkpoints and high availability information in the bucket, but when trying to get the JAR file from the same bucket I get the error:
Could not find a file system implementation for scheme 's3'. The scheme is directly supported by Flink through the following plugins: flink-s3-fs-hadoop, flink-s3-fs-presto.
I was able to get to this thread, but I wasn't able to get the resource fetcher to work correctly. Also the solution is not ideal and I was searching for a more direct approach.
Deployment files
Here's the files that I'm deploying in the cluster:
deployment.yml
apiVersion: flink.apache.org/v1beta1
kind: FlinkDeployment
metadata:
name: flink-deployment
spec:
podTemplate:
apiVersion: v1
kind: Pod
metadata:
name: pod-template
spec:
containers:
- name: flink-main-container
env:
- name: ENABLE_BUILT_IN_PLUGINS
value: flink-s3-fs-presto-1.15.3.jar;flink-s3-fs-hadoop-1.15.3.jar
volumeMounts:
- mountPath: /flink-data
name: flink-volume
volumes:
- name: flink-volume
hostPath:
path: /tmp
type: Directory
image: flink:1.15
flinkVersion: v1_15
flinkConfiguration:
state.checkpoints.dir: s3://kubernetes-operator/checkpoints
state.savepoints.dir: s3://kubernetes-operator/savepoints
high-availability: org.apache.flink.kubernetes.highavailability.KubernetesHaServicesFactory
high-availability.storageDir: s3://kubernetes-operator/ha
jobManager:
resource:
memory: "2048m"
cpu: 1
taskManager:
resource:
memory: "2048m"
cpu: 1
serviceAccount: flink
session-job.yml
apiVersion: flink.apache.org/v1beta1
kind: FlinkSessionJob
metadata:
name: flink-session-job
spec:
deploymentName: flink-deployment
job:
jarURI: s3://kubernetes-operator/savepoints/flink.jar
parallelism: 3
upgradeMode: savepoint
savepointTriggerNonce: 0
The Flink Kubernetes operator version that I'm using is 1.3.1
Is there anything that I'm missing or doing wrong?
The download of the jar happens in flink-kubernetes-operator pod. So, when you apply FlinkSessionJob, the fink-operator would recognize the Crd and will try to download the jar from jarUri location and construct a JobGraph and submit the sessionJob to JobDeployment. Flink Kubernetes Operator will also have flink running inside it to build a JobGraph.
So, You will have to add flink-s3-fs-hadoop-1.15.3.jar in location /opt/flink/plugins/s3-fs-hadoop/ inside flink-kubernetes-operator
You can add the jar either by extending the ghcr.io/apache/flink-kubernetes-operator image, curl the jar and copy it to plugins location
or
You can write an initContainer which will download the jar to a volume and mount that volume
volumes:
- name: s3-plugin
emptyDir: { }
initContainers:
- name: busybox
image: busybox:latest
volumeMounts:
- mountPath: /opt/flink/plugins/s3-fs-hadoop
name: s3-plugin
containers:
- image: 'ghcr.io/apache/flink-kubernetes-operator:95128bf'
name: flink-kubernetes-operator
volumeMounts:
- mountPath: /opt/flink/plugins/s3-fs-hadoop
name: s3-plugin
Also, if you are using serviceAccount for S3 authentication, give below config in flinkConfig
fs.s3a.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider

Mount camera to pod get MountVolume.SetUp failed for volume "default-token-c8hm5" : failed to sync secret cache: timed out waiting for the condition

On my Jetson NX, I like to set a yaml file that can mount 2 cameras to pod,
the yaml:
containers:
- name: my-pod
image: my_image:v1.0.0
imagePullPolicy: Always
volumeMounts:
- mountPath: /dev/video0
name: dev-video0
- mountPath: /dev/video1
name: dev-video1
resources:
limits:
nvidia.com/gpu: 1
ports:
- containerPort: 9000
command: [ "/bin/bash"]
args: ["-c", "while true; do echo hello; sleep 10;done"]
securityContext:
privileged: true
volumes:
- hostPath:
path: /dev/video0
type: ""
name: dev-video0
- hostPath:
path: /dev/video1
type: ""
name: dev-video1
but when I deploy it as pod, get the error:
MountVolume.SetUp failed for volume "default-token-c8hm5" : failed to sync secret cache: timed out waiting for the condition
I had tried to remove volumes in yaml, and the pod can be successfully deployed. Any comments on this issue?
Another issue is that when there is a pod got some issues, it will consume the rest of my storage of my Jetson NX, I guess maybe k8s will make lots of temporary files or logs...? when something wrong happening, any solution to this issue, otherwise all od my pods will be evicted...

The active users count dont match on execution through 2 containers in Kubernates

We are running jmx through Tauras using 2 containers in Kubernetes.
We are seeing only 50 users in results instead of 100(50*2 containers).
Can anyone please through some light if we are missing something here.
We get two jtl and checking them individual or combined the total users are same 50 only. Is it related to same Thread name being generated and logged in jtl file or something else.
Here is the yml details:
apiVersion: v1
kind: ConfigMap
metadata:
name: joba
namespace: AAA
data:
protocol: "https"
serverUrl: “testurl”
users: "50”
duration: "1m"
nodeName: "Nodename"
---
apiVersion: /v1
kind: Job
metadata:
name: perftest
namespace: dev
spec:
template:
spec:
containers:
- args: ["split -l ${users} --numeric-suffixes Test.csv Test-; /bin/bash ./Shellscripttoread_assignvariables.sh;"]
command: ["/bin/bash", "-c"]
env:
- name: JobNumber
value: "00"
envFrom:
- configMapRef:
name: job-multi
image: imagepath
name: ubuntu-00
resources:
limits:
memory: “8000Mi"
cpu: "2880m"
- args: ["split -l ${users} --numeric-suffixes Test.csv Test-; /bin/bash ./Shellscripttoread_assignvariables.sh;"]
command: ["/bin/bash", "-c"]
env:
- name: JobNumber
value: "01”
envFrom:
- configMapRef:
name: job-multi
image: imagepath
name: ubuntu-01
resources:
limits:
memory: “8000Mi"
cpu: "2880m"
Your YAML is very nice but it doesn't tell anything about how do you launch JMeter or what these shell scripts you invoke are doing.
If you just kick off 2 separate JMeter instances by means of k8s - JMeter will look at the number of active threads from the .jtl file and given the Sampler/Transaction names are the same JMeter "thinks" that the tests were executed on one engine.
The workaround is to add i.e. machineName() or __machineIP() function to sampler/transaction labels, this way JMeter will distinguish the results coming from different instances and you will see real number of active threads.
The solution would be running your JMeter test in Distributed Mode so master will run in one pod, slaves in their own pods and the master will be responsible for transferring .jmx script to the slaves and collecting results from them

GKE node with modprobe

Is there a way to load any kernel module ("modprobe nfsd" in my case) automatically after starting/upgrading nodes or in GKE? We are running an NFS server pod on our kubernetes cluster and it dies after every GKE upgrade
Tried both cos and ubuntu images, none of them seems to have nfsd loaded by default.
Also tried something like this, but it seems it does not do what it is supposed to do:
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: nfsd-modprobe
labels:
app: nfsd-modprobe
spec:
template:
metadata:
labels:
app: nfsd-modprobe
spec:
hostPID: true
containers:
- name: nfsd-modprobe
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: Always
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#! /bin/bash
modprobe nfs
modprobe nfsd
while true; do sleep 1; done
I faced the same issue, existing answer is correct, I want to expand it with working example of nfs pod within kubernetes cluster which has capabilities and libraries to load required modules.
It has two important parts:
privileged mode
mounted /lib/modules directory within the container to use it
nfs-server.yaml
kind: Pod
apiVersion: v1
metadata:
name: nfs-server-pod
spec:
containers:
- name: nfs-server-container
image: erichough/nfs-server
securityContext:
privileged: true
env:
- name: NFS_EXPORT_0
value: "/test *(rw,no_subtree_check,insecure,fsid=0)"
volumeMounts:
- mountPath: /lib/modules # mounting modules into container
name: lib-modules
readOnly: true # make sure it's readonly
- mountPath: /test
name: export-dir
volumes:
- hostPath: # using hostpath to get modules from the host
path: /lib/modules
type: Directory
name: lib-modules
- name: export-dir
emptyDir: {}
Reference which helped as well - Automatically load required kernel modules.
By default, you cannot load modules from inside a container because excluding kernel components is one of the main reason containers are lightweight and portable. You need to load the module from the host OS in order to make it available inside the container. This means you could simply launch a script that enables the kernel modules you want after each GKE upgrade.
However, there exists a somewhat hacky way to load kernel modules from inside a docker container. It all boils down to launching your container with escalated privileges and with access to certain host directories. You should try that if you really want to mount your kernel modules while inside a container.

Pre-populating Local SSD disk in GCP Kubernetes for readonly multipods usage

What is the best way to preload large files into a local PersistentVolume SSD before it gets used by Kubernetes pods?
The goal is to have multiple pods (could be multiple instances of the same pod, or different), share the same local SSD drive in a read-only mode. The drive would need to be initialized somehow with a large dataset.
Google Local SSD docs describes the Running the local volume static provisioner, but that approach only creates a PersistedVolume, but does not initialize it.
Basically, you can add an init container to your pod that initializes the SSD: add data, etc.
apiVersion: v1
kind: Pod
metadata:
name: "test-ssd"
spec:
initContainers:
- name: "init"
image: "ubuntu:14.04"
command: ["/bin/init_my_ssd.ssh"]
volumeMounts:
- mountPath: "/test-ssd/"
name: "test-ssd:
containers:
- name: "shell"
image: "ubuntu:14.04"
command: ["/bin/sh", "-c"]
args: ["echo 'hello world' > /test-ssd/test.txt && sleep 1 && cat /test-ssd/test.txt"]
volumeMounts:
- mountPath: "/test-ssd/"
name: "test-ssd"
volumes:
- name: "test-ssd"
hostPath:
path: "/mnt/disks/ssd0"
nodeSelector:
cloud.google.com/gke-local-ssd: "true"