Combining multiple Local-SSD on a node in Kubernetes (GKE) - kubernetes

The data required by my container is too large to fit on one local SSD. I also need to access the SSD's as one filesystem from my container. So I would need to attach multiple ones. How do I combine them (single partition, RAID0, etc) and make them accessible as one volume mount in my container?
This link shares how to mount an SSD https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/local-ssd to a mount path. I am not sure how you would merge multiple.
edit
The question asks how one would "combine" multiple SSD devices, individually mounted, on a single node in GKE.

WARNING
this is experimental and not intended for production use without
knowing what you are doing and only tested on gke version 1.16.x.
The approach includes a daemonset using a configmap to use nsenter (with wait tricks) for host namespace and privileged access so you can manage the devices. Specifically for GKE Local SSDs, we can unmount those devices and then raid0 them. InitContainer for the dirty work as this type of task seems most apparent for something you'd need to mark complete, and to then kill privileged container access (or even the Pod). Here is how it is done.
The example assumes 16 SSDs, however, you'll want to adjust the hardcoded values as necessary. Also, ensure your OS image reqs, I use Ubuntu. Also make sure the version of GKE you use starts local-ssd's at sd[b]
ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
name: local-ssds-setup
namespace: search
data:
setup.sh: |
#!/bin/bash
# returns exit codes: 0 = found, 1 = not found
isMounted() { findmnt -rno SOURCE,TARGET "$1" >/dev/null;} #path or device
# existing disks & mounts
SSDS=(/dev/sdb /dev/sdc /dev/sdd /dev/sde /dev/sdf /dev/sdg /dev/sdh /dev/sdi /dev/sdj /dev/sdk /dev/sdl /dev/sdm /dev/sdn /dev/sdo /dev/sdp /dev/sdq)
# install mdadm utility
apt-get -y update && apt-get -y install mdadm --no-install-recommends
apt-get autoremove
# OPTIONAL: determine what to do with existing, I wipe it here
if [ -b "/dev/md0" ]
then
echo "raid array already created"
if isMounted "/dev/md0"; then
echo "already mounted - unmounting"
umount /dev/md0 &> /dev/null || echo "soft error - assumed device was mounted"
fi
mdadm --stop /dev/md0
mdadm --zero-superblock "${SSDS[#]}"
fi
# unmount disks from host filesystem
for i in {0..15}
do
umount "${SSDS[i]}" &> /dev/null || echo "${SSDS[i]} already unmounted"
done
if isMounted "/dev/sdb";
then
echo ""
echo "unmount failure - prevent raid0" 1>&2
exit 1
fi
# raid0 array
yes | mdadm --create /dev/md0 --force --level=0 --raid-devices=16 "${SSDS[#]}"
echo "raid array created"
# format
mkfs.ext4 -F /dev/md0
# mount, change /mnt/ssd-array to whatever
mkdir -p /mnt/ssd-array
mount /dev/md0 /mnt/ssd-array
chmod a+w /mnt/ssd-array
wait.sh: |
#!/bin/bash
while sudo fuser /var/{lib/{dpkg,apt/lists},cache/apt/archives}/lock >/dev/null 2>&1; do sleep 1; done
DeamonSet pod spec
spec:
hostPID: true
nodeSelector:
cloud.google.com/gke-local-ssd: "true"
volumes:
- name: setup-script
configMap:
name: local-ssds-setup
- name: host-mount
hostPath:
path: /tmp/setup
initContainers:
- name: local-ssds-init
image: marketplace.gcr.io/google/ubuntu1804
securityContext:
privileged: true
volumeMounts:
- name: setup-script
mountPath: /tmp
- name: host-mount
mountPath: /host
command:
- /bin/bash
- -c
- |
set -e
set -x
# Copy setup script to the host
cp /tmp/setup.sh /host
# Copy wait script to the host
cp /tmp/wait.sh /host
# Wait for updates to complete
/usr/bin/nsenter -m/proc/1/ns/mnt -- chmod u+x /tmp/setup/wait.sh
# Give execute priv to script
/usr/bin/nsenter -m/proc/1/ns/mnt -- chmod u+x /tmp/setup/setup.sh
# Wait for Node updates to complete
/usr/bin/nsenter -m/proc/1/ns/mnt /tmp/setup/wait.sh
# If the /tmp folder is mounted on the host then it can run the script
/usr/bin/nsenter -m/proc/1/ns/mnt /tmp/setup/setup.sh
containers:
- image: "gcr.io/google-containers/pause:2.0"
name: pause

For high performance use cases, use the Ephemeral storage on local SSDs GKE feature. All local SSDs will be configures as a (striped) raid0 array and mounted into the pod.
Quick summary:
Create the node pool or cluster with the option: --ephemeral-storage local-ssd-count=X
Schedule to nodes with cloud.google.com/gke-ephemeral-storage-local-ssd.
Add an emptyDir volume.
Mount it with volumeMounts.
Here's how I used it with a DaemonSet:
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: myapp
labels:
app: myapp
spec:
selector:
matchLabels:
app: myapp
template:
metadata:
labels:
app: myapp
spec:
nodeSelector:
cloud.google.com/gke-ephemeral-storage-local-ssd: "true"
volumes:
- name: localssd
emptyDir: {}
containers:
- name: myapp
image: <IMAGE>
volumeMounts:
- mountPath: /scratch
name: localssd

You can use DaemonSet yaml file to deploy the pod will run on startup, assuming already created a cluster with 2 local-SSD (this pod will be in charge of creating the Raid0 disk)
kind: DaemonSet
apiVersion: extensions/v1beta1
metadata:
name: ssd-startup-script
labels:
app: ssd-startup-script
spec:
template:
metadata:
labels:
app: ssd-startup-script
spec:
hostPID: true
containers:
- name: ssd-startup-script
image: gcr.io/google-containers/startup-script:v1
imagePullPolicy: Always
securityContext:
privileged: true
env:
- name: STARTUP_SCRIPT
value: |
#!/bin/bash
sudo curl -s https://get.docker.com/ | sh
echo Done
The pod that will have access to the disk array in the above example is “/mnt/disks/ssd-array”
apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: test-container
image: ubuntu
volumeMounts:
- mountPath: /mnt/disks/ssd-array
name: ssd-array
args:
- sleep
- "1000"
nodeSelector:
cloud.google.com/gke-local-ssd: "true"
tolerations:
- key: "local-ssd"
operator: "Exists"
effect: "NoSchedule"
volumes:
- name: ssd-array
hostPath:
path: /mnt/disks/ssd-array
After deploying the test-pod, SSH to the pod from your cloud-shell or any instance.
Then run :
kubectl exec -it test-pod -- /bin/bash
After that you should be able to see the created file in the ssd-array disk.
cat test-file.txt

Related

kubernetes how to make a folder on my host as a Persistent volume

I am running a kuberneted cluster using minikube.
I want to mount a folder from my PC into the minikube.
How can i do that.
I see there is hostPath, but that used the node inside minikube
Just like in docker-compose we can mount a host folder into the container, is there any such provision
#Santhosh If i understand your question correctly, do you want to mount a path from within a container to a PV of type hostpath on your host? Have you tried this :-
apiVersion: v1
kind: Pod
metadata:
name: pv-recycler
namespace: default
spec:
restartPolicy: Never
volumes:
- name: vol
hostPath:
path: /any/path/it/will/be/replaced
containers:
- name: pv-recycler
image: "k8s.gcr.io/busybox"
command: ["/bin/sh", "-c", "test -e /scrub && rm -rf /scrub/..?* /scrub/.[!.]* /scrub/* && test -z \"$(ls -A /scrub)\" || exit 1"]
volumeMounts:
- name: vol
mountPath: /scrub

Kubernetes Pod permission denied on local volume

I have created a pod on Kubernetes and mounted a local volume but when I try to execute the ls command on locally mounted volume, I get a permission denied error. If I disable SELINUX then everything works fine. I am unable to make out how do I make it work with SELinux enabled.
Following is the output of permission denied:
kubectl apply -f testpod.yaml
root#olcne-operator-ol8 opc]# kubectl get all
NAME READY STATUS RESTARTS AGE
pod/testpod 1/1 Running 0 5s
# kubectl exec -i -t testpod /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root#testpod /]# cd /u01
[root#testpod u01]# ls
ls: cannot open directory '.': Permission denied
[root#testpod u01]#
Following is the testpod.yaml
cat testpod.yaml
kind: Pod
apiVersion: v1
metadata:
name: testpod
labels:
name: testpod
spec:
hostname: testpod
restartPolicy: Never
volumes:
- name: swvol
hostPath:
path: /u01
containers:
- name: testpod
image: oraclelinux:8
imagePullPolicy: Always
securityContext:
privileged: false
command: [/usr/sbin/init]
volumeMounts:
- mountPath: "/u01"
name: swvol
Selinux Configuration on worker node:
# sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Memory protection checking: actual (secure)
Max kernel policy version: 31
---
# semanage fcontext -l | grep kub | grep container_file
/var/lib/kubelet/pods(/.*)? all files system_u:object_r:container_file_t:s0
/var/lib/kubernetes/pods(/.*)? all files system_u:object_r:container_file_t:s0
Machine OS Details
rpm -qa | grep kube
kubectl-1.20.6-2.el8.x86_64
kubernetes-cni-0.8.1-1.el8.x86_64
kubeadm-1.20.6-2.el8.x86_64
kubelet-1.20.6-2.el8.x86_64
kubernetes-cni-plugins-0.9.1-1.el8.x86_64
----
cat /etc/oracle-release
Oracle Linux Server release 8.4
---
uname -r
5.4.17-2102.203.6.el8uek.x86_64
This is a community wiki answer posted for better visibility. Feel free to expand it.
SELinux labels can be assigned with seLinuxOptions:
apiVersion: v1
metadata:
name: testpod
labels:
name: testpod
spec:
hostname: testpod
restartPolicy: Never
volumes:
- name: swvol
hostPath:
path: /u01
containers:
- name: testpod
image: oraclelinux:8
imagePullPolicy: Always
command: [/usr/sbin/init]
volumeMounts:
- mountPath: "/u01"
name: swvol
securityContext:
seLinuxOptions:
level: "s0:c123,c456"
From the official documentation:
seLinuxOptions: Volumes that support SELinux labeling are relabeled
to be accessible by the label specified under seLinuxOptions.
Usually you only need to set the level section. This sets the
Multi-Category Security (MCS) label given to all Containers in the Pod
as well as the Volumes.
Based on the information from the original post on stackoverflow:
You can only specify the level portion of an SELinux label when relabeling a path destination pointed to by a hostPath volume. This
is automatically done so by the seLinuxOptions.level attribute
specified in your securityContext.
However attributes such as seLinuxOptions.type currently have no
effect on volume relabeling. As of this writing, this is still an
open issue within
Kubernetes

create an empty file inside a volume in Kubernetes pod

I have a legacy app which keep checking an empty file inside a directory and perform certain action if the file timestamp is changed.
I am migrating this app to Kubernetes so I want to create an empty file inside the pod. I tried subpath like below but it doesn't create any file.
apiVersion: v1
kind: Pod
metadata:
name: demo-pod
spec:
containers:
- name: demo
image: alpine
command: ["sleep", "3600"]
volumeMounts:
- name: volume-name
mountPath: '/volume-name-path'
subPath: emptyFile
volumes:
- name: volume-name
emptyDir: {}
describe pods shows
Containers:
demo:
Container ID: containerd://0b824265e96d75c5f77918326195d6029e22d17478ac54329deb47866bf8192d
Image: alpine
Image ID: docker.io/library/alpine#sha256:08d6ca16c60fe7490c03d10dc339d9fd8ea67c6466dea8d558526b1330a85930
Port: <none>
Host Port: <none>
Command:
sleep
3600
State: Running
Started: Wed, 10 Feb 2021 12:23:43 -0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-4gp4x (ro)
/volume-name-path from volume-name (rw,path="emptyFile")
ls on the volume also shows nothing.
k8 exec -it demo-pod -c demo ls /volume-name-path
any suggestion??
PS: I don't want to use a ConfigMap and simply wants to create an empty file.
If the objective is to create a empty file when the Pod starts, then the most easy way is to either use the entrypoint of the docker image or an init container.
With the initContainer, you could go with something like the following (or with a more complex init image which you build and execute a whole bash script or something similar):
apiVersion: v1
kind: Pod
metadata:
name: demo-pod
spec:
initContainers:
- name: create-empty-file
image: alpine
command: ["touch", "/path/to/the/directory/empty_file"]
volumeMounts:
- name: volume-name
mountPath: /path/to/the/directory
containers:
- name: demo
image: alpine
command: ["sleep", "3600"]
volumeMounts:
- name: volume-name
mountPath: /path/to/the/directory
volumes:
- name: volume-name
emptyDir: {}
Basically the init container gets executed first, runs its command and if it is successful, then it terminates and the main container starts running. They share the same volumes (and they can also mount them at different paths) so in the example, the init container mount the emptyDir volume, creates an empty file and then complete. When the main container starts, the file is already there.
Regarding your legacy application which is getting ported on Kubernetes:
If you have control of the Dockerfile, you could simply change it create an empty file at the path you are expecting it to be, so that when the app starts, the file is already created there, empty, from the beginning, just exactly as you add the application to the container, you can add also other files.
For more info on init container, please check the documentation (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/)
I think you may be interested in Container Lifecycle Hooks.
In this case, the PostStart hook may help create an empty file as soon as the container is started:
This hook is executed immediately after a container is created.
In the example below, I will show you how you can use the PostStart hook to create an empty file-test file.
First I created a simple manifest file:
# demo-pod.yml
apiVersion: v1
kind: Pod
metadata:
labels:
run: demo-pod
name: demo-pod
spec:
containers:
- image: alpine
name: demo-pod
command: ["sleep", "3600"]
lifecycle:
postStart:
exec:
command: ["touch", "/mnt/file-test"]
After creating the Pod, we can check if the demo-pod container has an empty file-test file:
$ kubectl apply -f demo-pod.yml
pod/demo-pod created
$ kubectl exec -it demo-pod -- sh
/ # ls -l /mnt/file-test
-rw-r--r-- 1 root root 0 Feb 11 09:08 /mnt/file-test
/ # cat /mnt/file-test
/ #

Copying files to a local/host directory

I am trying to copy files from a container to a local/host directory. Running my experiments on minikube. Tried starting minikube with a mount as: minikube mount /tmp/export:/data/export and it still does not work.
I have a single pod, that upon startup runs a simple script:
timeout --signal=SIGINT 10s clinic bubbleprof -- node index.js >> /tmp/clinic.output.log && \
cp -R `grep "." /tmp/clinic.output.log | tail -1 | grep -oE '[^ ]+$'`* /data/export/ && \
echo "Finished copying clinic run generated files"
Once my script finishes its run, the container dies. This happens because bash is the process with PID 1. I don't mind this. My problem is that /tmp/export is empty, after the files should have been copied out.
My pod yaml:
apiVersion: v1
kind: Pod
metadata:
name: clinic-testapp
spec:
containers:
- name: clinic-testapp
image: username/container-image:0.0.11
ports:
- containerPort: 80
volumeMounts:
- name: clinic-storage
mountPath: /data/export
volumes:
- name: clinic-storage
hostPath:
path: /tmp/export
Am I doing something wrong? Please advise.

Replication Controller replica ID in an environment variable?

I'm attempting to inject a ReplicationController's randomly generated pod ID extension (i.e. multiverse-{replicaID}) into a container's environment variables. I could manually get the hostname and extract it from there, but I'd prefer if I didn't have to add the special case into the script running inside the container, due to compatibility reasons.
If a pod is named multiverse-nffj1, INSTANCE_ID should equal nffj1. I've scoured the docs and found nothing.
apiVersion: v1
kind: ReplicationController
metadata:
name: multiverse
spec:
replicas: 3
template:
spec:
containers:
- env:
- name: INSTANCE_ID
value: $(replicaID)
I've tried adding a command into the controller's template configuration to create the environment variable from the hostname, but couldn't figure out how to make that environment variable available to the running script.
Is there a variable I'm missing, or does this feature not exist? If it doesn't, does anyone have any ideas on how to make this to work without editing the script inside of the container?
There is an answer provided by Anton Kostenko about inserting DB credentials into container environment variables, but it could be applied to your case also. It is all about the content of the InitContainer spec.
You can use InitContainer to get the hash from the container’s hostname and put it to the file on the shared volume that you mount to the container.
In this example InitContainer put the Pod name into the INSTANCE_ID environment variable, but you can modify it according to your needs:
Create the init.yaml file with the content:
apiVersion: v1
kind: Pod
metadata:
name: init-test
spec:
containers:
- name: init-test
image: ubuntu
args: [bash, -c, 'source /data/config && echo $INSTANCE_ID && while true ; do sleep 1000; done ']
volumeMounts:
- name: config-data
mountPath: /data
initContainers:
- name: init-init
image: busybox
command: ["sh","-c","echo -n INSTANCE_ID=$(hostname) > /data/config"]
volumeMounts:
- name: config-data
mountPath: /data
volumes:
- name: config-data
emptyDir: {}
Create the pod using following command:
kubectl create -f init.yaml
Check if Pod initialization is done and is Running:
kubectl get pod init-test
Check the logs to see the results of this example configuration:
$ kubectl logs init-test
init-test