sharing data between a cronjob and pod

sharing data between a cronjob and pod - kubernetes

Right now I have a cronjob that downloads data and I want to share it to another container that does the processing for the data as new ones are uploaded. I wanted to know if there was a way without any external services to share this data between the cronjob pod and my main pod?
I've tried creating a persistent volume and persistent volume claim to share the data but when the cronjob downloads the data it doesn't appear in the other pod even though the volume is mounted.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: download
spec:
concurrencyPolicy: Forbid
suspend: false
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
volumes:
- name: downloaded-data-claim
persistentVolumeClaim:
claimName: downloaded-data-claim
#container and image is here where it downloads
kind: PersistentVolume
metadata:
name: downloaded-data
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
claimRef:
name: downloaded-data-claim
namespace: default
hostPath:
path: "/tmp/"
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: downloaded-data-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 1Gi
volumeName: downloaded-data
and then the pod mounts the volume
volumes:
- name: downloaded-data-claim
presistentVolumeClaim:
claimName: downloaded-data-claim
- name: output
emptyDir: {}
containers:
- name: "rand"
image: <filler>
imagePullPolicy: <filler>
volumeMounts:
- name: downloaded-data-claim
mountPath: /input
- name: output
mountPath: /output
resources:

Make sure you have created CronJob in right namespace - where your pod and pv are.
Take notice if you have access to directory where you want your data to be stored.
Actually I don't think there is other possiblity than using external services.
Most useful are nfs volumes. But there are based on services and external nfs servers.
NFS stands for Network File System – it's a shared filesystem that can be accessed over the network.
The NFS must already exist – Kubernetes doesn't run the NFS, pods in just access it.

Related

Kubernetes Persistent Volume: MountPath directory created but empty

I have 2 pods, one that is writing files to a persistent volume and the other one supposedly reads those files to make some calculations.
The first pod writes the files successfully and when I display the content of the persistent volume using print(os.listdir(persistent_volume_path)) I get all the expected files. However, the same command on the second pod shows an empty directory. (The mountPath directory /data is created but empty.)
This is the TFJob yaml file:
apiVersion: kubeflow.org/v1
kind: TFJob
metadata:
name: pod1
namespace: my-namespace
spec:
cleanPodPolicy: None
tfReplicaSpecs:
Worker:
replicas: 1
restartPolicy: Never
template:
spec:
containers:
- name: tensorflow
image: my-image:latest
imagePullPolicy: Always
command:
- "python"
- "./program1.py"
- "--data_path=./dataset.csv"
- "--persistent_volume_path=/data"
volumeMounts:
- mountPath: "/data"
name: my-pv
volumes:
- name: my-pv
persistentVolumeClaim:
claimName: my-pvc
(respectively pod2 and program2.py for the second pod)
And this is the volume configuration:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
namespace: my-namespace
labels:
type: local
app: tfjob
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-pv
namespace: my-namespace
labels:
type: local
app: tfjob
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/data"
Does anyone have any idea where's the problem exactly and how to fix it?

When two pods should access a shared Persistent Volume with access mode ReadWriteOnce, concurrently - then the two pods must be running on the same node since the volume can only be mounted on a single node at a time with this access mode.
To achieve this, some form of Pod Affinity must be applied, such that they are scheduled to the same node.

Multiple Persistent Volumes with the same mount path Kubernetes

I have created 3 CronJobs in Kubernetes. The format is exactly the same for every one of them except the names. These are the following specs:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test-job-1 # for others it's test-job-2 and test-job-3
namespace: cron-test
spec:
schedule: "* * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: test-job-1 # for others it's test-job-2 and test-job-3
image: busybox
imagePullPolicy: IfNotPresent
command:
- "/bin/sh"
- "-c"
args:
- cd database-backup && touch $(date +%Y-%m-%d:%H:%M).test-job-1 && ls -la # for others the filename includes test-job-2 and test-job-3 respectively
volumeMounts:
- mountPath: "/database-backup"
name: test-job-1-pv # for others it's test-job-2-pv and test-job-3-pv
volumes:
- name: test-job-1-pv # for others it's test-job-2-pv and test-job-3-pv
persistentVolumeClaim:
claimName: test-job-1-pvc # for others it's test-job-2-pvc and test-job-3-pvc
And also the following Persistent Volume Claims and Persistent Volume:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: test-job-1-pvc # for others it's test-job-2-pvc or test-job-3-pvc
namespace: cron-test
spec:
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
resources:
requests:
storage: 1Gi
volumeName: test-job-1-pv # depending on the name it's test-job-2-pv or test-job-3-pv
storageClassName: manual
volumeMode: Filesystem
apiVersion: v1
kind: PersistentVolume
metadata:
name: test-job-1-pv # for others it's test-job-2-pv and test-job-3-pv
namespace: cron-test
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/database-backup"
So all in all there are 3 CronJobs, 3 PersistentVolumes and 3 PersistentVolumeClaims. I can see that the PersistentVolumeClaims and PersistentVolumes are bound correctly to each other. So test-job-1-pvc <--> test-job-1-pv, test-job-2-pvc <--> test-job-2-pv and so on. Also the pods associated with each PVC are are the corresponding pods created by each CronJob. For example test-job-1-1609066800-95d4m <--> test-job-1-pvc and so on. After letting the cron jobs run for a bit I create another pod with the following specs to inspect test-job-1-pvc:
apiVersion: v1
kind: Pod
metadata:
name: data-access
namespace: cron-test
spec:
containers:
- name: data-access
image: busybox
command: ["sleep", "infinity"]
volumeMounts:
- name: data-access-volume
mountPath: /database-backup
volumes:
- name: data-access-volume
persistentVolumeClaim:
claimName: test-job-1-pvc
Just a simple pod that keeps running all the time. When I get inside that pod with exec and see inside the /database-backup directory I see all the files created from all the pods created by the 3 CronJobs.
What I exepected to see?
I expected to see only the files created by test-job-1.
Is this something expected to happen? And if so how can you separate the PersistentVolumes to avoid something like this?

I suspect this is caused by the PersistentVolume definition: if you really only changed the name, all volumes are mapped to the same folder on the host.
hostPath:
path: "/database-backup"
Try giving each volume a unique folder, e.g.
hostPath:
path: "/database-backup/volume1"

How to have multiple pods access an existing NFS folder in Kubernetes?

I have a folder of TFRecords on a network that I want to expose to multiple pods. The folder has been exported via NFS.
I have tried creating a Persistent Volume, followed by a Persistent Volume Claim. However, that just creates a folder inside the NFS mount, which I don't want. Instead, I want to Pod to access the folder with the TFRecords.
I have listed the manifests for the PV and PVC.
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-tfrecord-pv
spec:
capacity:
storage: 30Gi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
nfs:
path: /media/veracrypt1/
server: 1.2.3.4
readOnly: false
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-tfrecord-pvc
namespace: default
spec:
accessModes:
- ReadWriteMany
storageClassName: nfs-tfrecord
resources:
requests:
storage: 1Gi

I figured it out. The issue was I was looking at the problem the wrong way. I didn't need any provisioning. Instead, what was need was to simply mount the NFS volume within the container:
kind: Pod
apiVersion: v1
metadata:
name: pod-using-nfs
spec:
containers:
- name: app
image: alpine
volumeMounts:
- name: data
mountPath: /mnt/data
command: ["/bin/sh"]
args: ["-c", "sleep 500000"]
volumes:
- name: data
nfs:
server: 1.2.3.4
path: /media/foo/DATA

Creating a NFS sidecar for Kubernetes

I am trying to create a NFS sidecar for Kubernetes. The goal is to be able to mount an NFS volume to an existing pod without affecting performance. At the same time, I want to be able to mount the same NFS volume onto another pod or server (read-only perhaps) in order to view the content there. Has anyone tried this? Do anyone have the procedure?

Rather than use a sidecar I would suggest using a PersistentVolume which uses the NFS driver and PersistentVolumeClaim. If you use the RWX/ReadWriteMany access mode, you'll be able to mount the share into multiple pods.
For examplen the pv:
kind: PersistentVolume
apiVersion: v1
metadata:
name: mypv
spec:
capacity:
storage: 2Gi
nfs:
server: my.nfs.server
path: /myshare
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Recycle
the pvc:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: myclaim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 2Gi
and mounted in a pod:
apiVersion: v1
kind: Pod
metadata:
name: mypod
spec:
containers:
- name: myfrontend
image: nginx
volumeMounts:
- mountPath: "/var/www/html"
name: mypd
volumes:
- name: mypd
persistentVolumeClaim:
claimName: myclaim
Kubernetes Docs on Persistent Volumes

kubernetes persistence volume and persistence volume claim exceeded storage

By following kubernetes guide i have created a pv, pvc and pod. i have claimed only 10Mi of out of 20Mi pv. I have copied 23Mi that is more than my pv. But my pod is still running. Can any one explain ?
pv-volume.yaml
kind: PersistentVolume
apiVersion: v1
metadata:
name: task-pv-volume
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 20Mi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/data"
pv-claim.yaml
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: task-pv-claim
spec:
storageClassName: manual
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Mi
pv-pod.yaml
kind: Pod
apiVersion: v1
metadata:
name: task-pv-pod
spec:
volumes:
- name: task-pv-storage
persistentVolumeClaim:
claimName: task-pv-claim
containers:
- name: task-pv-container
image: nginx
ports:
- containerPort: 80
name: "http-server"
volumeMounts:
- mountPath: "/usr/share/nginx/html"
name: task-pv-storage

Probably you can copy as much data into shared storage /mnt/data (on your active node) using any of applied POD's storages ,/usr/share/nginx/html, shared between node and pods till your node will stop responding.
In case you need to test this scenario in more real conditions could you please consider create NFS persistent storage using GlusterFS, nfs-utils, or mount a raw partition file made with dd.
In Minikube nodes are using ephemeral-storages. Detailed information about node/pod resources you can find here:
https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/#node-allocatable
Hope this help.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

sharing data between a cronjob and pod - kubernetes

Related

Kubernetes Persistent Volume: MountPath directory created but empty

Multiple Persistent Volumes with the same mount path Kubernetes

How to have multiple pods access an existing NFS folder in Kubernetes?

Creating a NFS sidecar for Kubernetes

kubernetes persistence volume and persistence volume claim exceeded storage

Categories

Resources