Mounting a hostPath persistent volume to an Argo Workflow task template - kubernetes

I am working on a small proof-of-concept project for my company and would like to use Argo Workflows to automate some data engineering tasks. It's really easy to get set up and I've been able to create a number of workflows that process data that is stored in a Docker image or is retrieved from a REST API. However, to work with our sensitive data I would like to mount a hostPath persistent volume to one of my workflow tasks. When I follow the documentation I don't get the desired behavior, the directory appears empty.
OS: Ubuntu 18.04.4 LTS
Kubernetes executor: Minikube v1.20.0
Kubernetes version: v1.20.2
Argo Workflows version: v3.1.0-rc4
My persistent volume (claim) looks like this:
apiVersion: v1
kind: PersistentVolume
metadata:
name: argo-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp/data"
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: argo-hello
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
and I run
kubectl -n argo apply -f pv.yaml
My workflow looks as follows:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: hello-world-volumes-
spec:
entrypoint: dag-template
arguments:
parameters:
- name: file
value: /mnt/vol/test.txt
volumes:
- name: datadir
persistentVolumeClaim:
claimName: argo-hello
templates:
- name: dag-template
inputs:
parameters:
- name: file
dag:
tasks:
- name: readFile
arguments:
parameters: [{name: path, value: "{{inputs.parameters.file}}"}]
template: read-file-template
- name: print-message
template: helloargo
arguments:
parameters: [{name: msg, value: "{{tasks.readFile.outputs.result}}"}]
dependencies: [readFile]
- name: helloargo
inputs:
parameters:
- name: msg
container:
image: lambertsbennett/helloargo
args: ["-msg", "{{inputs.parameters.msg}}"]
- name: read-file-template
inputs:
parameters:
- name: path
container:
image: alpine:latest
command: [sh, -c]
args: ["find /mnt/vol; ls -a /mnt/vol"]
volumeMounts:
- name: datadir
mountPath: /mnt/vol
When this workflow executes it just prints an empty directory even though I populated the host directory with files. Is there something I am fundamentally missing? Thanks for any help.

Related

Mounting Windows local folder into pod

I'm running a Ubuntu container with SQL Server in my local Kubernetes environment with Docker Desktop on a Windows laptop.
Now I'm trying to mount a local folder (C:\data\sql) that contains database files into the pod.
For this, I configured a persistent volume and persistent volume claim in Kubernetes, but it doesn't seem to mount correctly. I don't see errors or anything, but when I go into the container using docker exec -it and inspect the data folder, it's empty. I expect the files from the local folder to appear in the mounted folder 'data', but that's not the case.
Is something wrongly configured in the PV, PVC or pod?
Here are my yaml files:
apiVersion: v1
kind: PersistentVolume
metadata:
name: dev-customer-db-pv
labels:
type: local
app: customer-db
chart: customer-db-0.1.0
release: dev
heritage: Helm
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /C/data/sql
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dev-customer-db-pvc
labels:
app: customer-db
chart: customer-db-0.1.0
release: dev
heritage: Helm
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
apiVersion: apps/v1
kind: Deployment
metadata:
name: dev-customer-db
labels:
ufo: dev-customer-db-config
app: customer-db
chart: customer-db-0.1.0
release: dev
heritage: Helm
spec:
selector:
matchLabels:
app: customer-db
release: dev
replicas: 1
template:
metadata:
labels:
app: customer-db
release: dev
spec:
volumes:
- name: dev-customer-db-pv
persistentVolumeClaim:
claimName: dev-customer-db-pvc
containers:
- name: customer-db
image: "mcr.microsoft.com/mssql/server:2019-latest"
imagePullPolicy: IfNotPresent
volumeMounts:
- name: dev-customer-db-pv
mountPath: /data
envFrom:
- configMapRef:
name: dev-customer-db-config
- secretRef:
name: dev-customer-db-secrets
At first, I was trying to define a volume in the pod without PV and PVC, but then I got access denied errors when I tried to read files from the mounted data folder.
spec:
volumes:
- name: dev-customer-db-data
hostPath:
path: C/data/sql
containers:
...
volumeMounts:
- name: dev-customer-db-data
mountPath: data
I've also tried to Helm install with --set volumePermissions.enabled=true but this didn't solve the access denied errors.
Based on this info from GitHub for Docker there is no support hostpath volumes in WSL 2.
Thus, next workaround can be used.
We need just to append /run/desktop/mnt/host to the initial path on the host /c/data/sql. No need for PersistentVolume and PersistentVolumeClaim in this case - just remove them.
I changed spec.volumes for Deployment according to information about hostPath configuration on Kubernetes site:
volumes:
- name: dev-customer-db-pv
hostPath:
path: /run/desktop/mnt/host/c/data/sql
type: Directory
After applying these changes, the files can be found in data folder in the pod, since mountPath: /data

Need support on using git-sync and persistent volumes

I am trying to use git-sync to write data from a gitlab repo to a persistent volume, then pull that data into another pod (trim_load) and perform a job. Here are the manifests I have set up. I am new to this and developing locally, and I could use all the direction I can get!
I am getting an error that the directory doesn't exist, but it does on my local machine, but not on the kind cluster that I am using. How do I create a directory on the kind cluster?
apiVersion: v1
kind: PersistentVolume
metadata:
name: dbt-pv
spec:
capacity:
storage: 2Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /Users/my_user/k8s/pv1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kind-control-plane
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: dbt-pvc
spec:
storageClassName: local-storage
accessModes:
- ReadWriteMany
volumeName: dbt-pv
resources:
requests:
storage: 1Gi
apiVersion: v1
kind: Pod
metadata:
name: gitsync-sidecar
spec:
containers:
- name: git-sync
# This container pulls git data and publishes it into volume
# "content-from-git". In that volume you will find a symlink
# "current" (see -dest below) which points to a checked-out copy of
# the master branch (see -branch) of the repo (see -repo).
# NOTE: git-sync already runs as non-root.
image: k8s.gcr.io/git-sync/git-sync:v3.3.4
args:
- --repo= <the git repo I wanna copy HTTPS link>
- --branch=master
- --depth=1
- --period=60
- --link=current
- --root=/git # I don't know what this means
volumeMounts:
- name: dbt-pv
mountPath: /git # I don't know what this means
volumes:
- name: dbt-pv
persistentVolumeClaim:
claimName: dbt-pvc
apiVersion: v1
kind: Pod
metadata:
name: trim-pod
spec:
containers:
- name: trim-pod-cont
image: <my docker container to run the code>
volumeMounts:
- name: dbt-pv
mountPath: /tmp/dbt
volumes:
- name: dbt-pv
persistentVolumeClaim:
claimName: dbt-pvc
I'm not familiar with kind, but from what i gather on their website, it works like minikube. Which means your cluster is inside a container itself.
The folder on your local machine exists, but your volume is looking for a folder inside the host machine, which is the container running your cluster.
You have to open a shell in your cluster container and create the folder there.
If you want to access this folder on your local machine, you then have to mount your local folder in your kind cluster.

Multiple Persistent Volumes with the same mount path Kubernetes

I have created 3 CronJobs in Kubernetes. The format is exactly the same for every one of them except the names. These are the following specs:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test-job-1 # for others it's test-job-2 and test-job-3
namespace: cron-test
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: test-job-1 # for others it's test-job-2 and test-job-3
image: busybox
imagePullPolicy: IfNotPresent
command:
- "/bin/sh"
- "-c"
args:
- cd database-backup && touch $(date +%Y-%m-%d:%H:%M).test-job-1 && ls -la # for others the filename includes test-job-2 and test-job-3 respectively
volumeMounts:
- mountPath: "/database-backup"
name: test-job-1-pv # for others it's test-job-2-pv and test-job-3-pv
volumes:
- name: test-job-1-pv # for others it's test-job-2-pv and test-job-3-pv
persistentVolumeClaim:
claimName: test-job-1-pvc # for others it's test-job-2-pvc and test-job-3-pvc
And also the following Persistent Volume Claims and Persistent Volume:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-job-1-pvc # for others it's test-job-2-pvc or test-job-3-pvc
namespace: cron-test
spec:
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
resources:
requests:
storage: 1Gi
volumeName: test-job-1-pv # depending on the name it's test-job-2-pv or test-job-3-pv
storageClassName: manual
volumeMode: Filesystem
apiVersion: v1
kind: PersistentVolume
metadata:
name: test-job-1-pv # for others it's test-job-2-pv and test-job-3-pv
namespace: cron-test
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/database-backup"
So all in all there are 3 CronJobs, 3 PersistentVolumes and 3 PersistentVolumeClaims. I can see that the PersistentVolumeClaims and PersistentVolumes are bound correctly to each other. So test-job-1-pvc <--> test-job-1-pv, test-job-2-pvc <--> test-job-2-pv and so on. Also the pods associated with each PVC are are the corresponding pods created by each CronJob. For example test-job-1-1609066800-95d4m <--> test-job-1-pvc and so on. After letting the cron jobs run for a bit I create another pod with the following specs to inspect test-job-1-pvc:
apiVersion: v1
kind: Pod
metadata:
name: data-access
namespace: cron-test
spec:
containers:
- name: data-access
image: busybox
command: ["sleep", "infinity"]
volumeMounts:
- name: data-access-volume
mountPath: /database-backup
volumes:
- name: data-access-volume
persistentVolumeClaim:
claimName: test-job-1-pvc
Just a simple pod that keeps running all the time. When I get inside that pod with exec and see inside the /database-backup directory I see all the files created from all the pods created by the 3 CronJobs.
What I exepected to see?
I expected to see only the files created by test-job-1.
Is this something expected to happen? And if so how can you separate the PersistentVolumes to avoid something like this?
I suspect this is caused by the PersistentVolume definition: if you really only changed the name, all volumes are mapped to the same folder on the host.
hostPath:
path: "/database-backup"
Try giving each volume a unique folder, e.g.
hostPath:
path: "/database-backup/volume1"

how to set rabbitmq data directory to pvc in kubernetes pod

I tried to create standalone rabbitmq kubernetes service. And the data should be mount to my persistent volume.
.....
apiVersion: v1
kind: ConfigMap
metadata:
name: rabbitmq-config
data:
enabled_plugins: |
[rabbitmq_management,rabbitmq_peer_discovery_k8s].
rabbitmq.conf: |
loopback_users.guest = false
---
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: standalone-rabbitmq
spec:
serviceName: standalone-rabbitmq
replicas: 1
template:
.....
volumeMounts:
- name: config-volume
mountPath: /etc/rabbitmq
- name: standalone-rabbitmq-data
mountPath: /data
- name: config-volume
configMap:
name: rabbitmq-config
items:
- key: rabbitmq.conf
path: rabbitmq.conf
- key: enabled_plugins
path: enabled_plugins
- name: standalone-rabbitmq-data
persistentVolumeClaim:
claimName: standalone-rabbitmq-pvc-test
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: standalone-rabbitmq-pvc-test
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: test-storage-class
According to my research, I understood that data directory of rabbitmq is RABBITMQ_MNESIA_DIR (please see https://www.rabbitmq.com/relocate.html). So I just wanted to set this parameter “/data”, so that my new PVC(standalone-rabbitmq-pvc-test) is used to keep the data.
Can you tell me how to configurate this?
HTH, So pretty much here's my configuration. From which you can see there are mount points. First is the data, second is config and third is definitions respectively as in the YML mentioned below.
volumeMounts:
- mountPath: /var/lib/rabbitmq
name: rmqdata
- mountPath: /etc/rabbitmq
name: config
- mountPath: /etc/definitions
name: definitions
readOnly: true
And here's the PVC template stuff.
volumeClaimTemplates:
- metadata:
creationTimestamp: null
name: rmqdata
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Gi
storageClassName: nfs-provisioner
volumeMode: Filesystem
Post Update: As per the comments, you can mount it directly to the data folder like this. The section where you assign the rmqdata to mountpath will stay the same.
volumes:
- hostPath:
path: /data
type: DirectoryOrCreate
name: rmqdata
In order to add the path, create a new file, let's say 'rabbitmq.properties' and add all the environment variables you'll need, one per line:
echo "RABBITMQ_MNESIA_DIR=/data" >> rabbitmq.properties
Then run kubectl create configmap rabbitmq-config --from-file=rabbitmq.properties to generate the configmap.
If you need to aggregate multiple config files in one configmap point the folder's full path in the --from-file argument
Then you can run kubectl get configmaps rabbitmq-config -o yaml
to display the yaml that was created:
user#k8s:~$ kubectl get configmaps rabbitmq-config -o yaml
apiVersion: v1
data:
rabbitmq.properties: |
RABBITMQ_MNESIA_DIR=/data
kind: ConfigMap
metadata:
creationTimestamp: "2019-12-30T11:33:27Z"
name: rabbitmq-config
namespace: default
resourceVersion: "1106939"
selfLink: /api/v1/namespaces/default/configmaps/rabbitmq-config
uid: 4c6b1599-a54b-4e0e-9b7d-2799ea5d9e39
If all other aspects of your configmap is correct, you can just add in the data section of your configmap this lines:
rabbitmq.properties: |
RABBITMQ_MNESIA_DIR=/data

Persistent volume to windows not working on kubernetes

I have map windows folder into me linux machine with
mount -t cifs //AUTOCHECK/OneStopShopWebAPI -o user=someuser,password=Aa1234 /xml_shared
and the following command
df -hk
give me
//AUTOCHECK/OneStopShopWebAPI 83372028 58363852 25008176 71% /xml_shared
after that I create yaml file with
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-nfs-jenkins-slave
spec:
storageClassName: jenkins-slave-data
accessModes:
- ReadWriteMany
resources:
requests:
storage: 4Gi
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-nfs-jenkins-slave
labels:
type: jenkins-slave-data2
spec:
storageClassName: jenkins-slave-data
capacity:
storage: 4Gi
accessModes:
- ReadWriteMany
nfs:
server: 192.168.100.109
path: "//AUTOCHECK/OneStopShopWebAPI/jenkins_slave_shared"
this seems to not work when I create new pod
apiVersion: v1
kind: Pod
metadata:
name: jenkins-slave
labels:
label: jenkins-slave
spec:
containers:
- name: node
image: node
command:
- cat
tty: true
volumeMounts:
- mountPath: /var/jenkins_slave_shared
name: jenkins-slave-vol
volumes:
- name: jenkins-slave-vol
persistentVolumeClaim:
claimName: pvc-nfs-jenkins-slave
do i need to change the nfs ? what is wrong with me logic?
The mounting of CIFS share under Linux machine is correct but you need to take different approach to mount CIFS volume under Kubernetes. Let me explain:
There are some differences between NFS and CIFS.
This site explained the whole process step by step: Github CIFS Kubernetes.