persistent volume on openshift - kubernetes

I am new to openshift ,I have deployed one application on openshift which uses persistent volume to store the files,and there is another application which pick that file and process it.
Now my challenge is I am not able to understand how to use same persistent volume for two application,
and how to pick the file from persistent volume is it mountPath where files get stored ?

To leverage shared storage for use by two separate containers (in two independent pods) configure PV of type NFS, or other shared storage such as GlusterFS etc.
A basic example using NFS available here Sharing an NFS Persistent Volume (PV) Across Two Pods

You can address your requirement using one of the below option
use the two containers in the same pod. that way both the containers can share the volume
Use NFS or some other persistent storage that supports ReadWriteMany. That way multiple pods can share same volume

Related

Best way to access and store files on kubernetes cluster without cloud resources

I need to persist files of different formats and sizes on a Kubernetes cluster volume and access them simultaneously by several applications.
I know there are cloud resources like Azure Files that can help with this issue of simultaneous access to the same storage volume. However, one of my project requirements is not to use cloud resources to persist files.
So what can be the best way to persist files and access them simultaneously without using any cloud resources?
We are currently running NFS which is behaving really good and it is pretty straight forward to set it up, however there are several options to get non cloud storages:
cephfs
A cephfs volume allows an existing CephFS volume to be mounted into your Pod. Unlike emptyDir, which is erased when a pod is removed, the contents of a cephfs volume are preserved and the volume is merely unmounted. This means that a cephfs volume can be pre-populated with data, and that data can be shared between pods. The cephfs volume can be mounted by multiple writers simultaneously.
More info
iscsi (does not meet your needs!)
An iscsi volume allows an existing iSCSI (SCSI over IP) volume to be mounted into your Pod. Unlike emptyDir, which is erased when a Pod is removed, the contents of an iscsi volume are preserved and the volume is merely unmounted. This means that an iscsi volume can be pre-populated with data, and that data can be shared between pods.
A feature of iSCSI is that it can be mounted as read-only by multiple consumers simultaneously. This means that you can pre-populate a volume with your dataset and then serve it in parallel from as many Pods as you need. Unfortunately, iSCSI volumes can only be mounted by a single consumer in read-write mode. Simultaneous writers are not allowed.
More info
nfs
An nfs volume allows an existing NFS (Network File System) share to be mounted into a Pod. Unlike emptyDir, which is erased when a Pod is removed, the contents of an nfs volume are preserved and the volume is merely unmounted. This means that an NFS volume can be pre-populated with data, and that data can be shared between pods. NFS can be mounted by multiple writers simultaneously.
More info
Further info-Kubernetes Volumes

What storage to use for passing data between pods?

I am working with kubernetes and I need to pass parquet files containing datasets between pods , but I don't know which option will work best.
As I know, persistent disk allows me to mount a shared volume on my pods, but with cloud storage I can share these files too.
All the process is hosted on google cloud.
If you want to persist the data you have to use the file store of Google. Which will support the read write many.
Persistent Volumes in GKE are supported using the Persistent Disks.
The problem with these disks is that they only support
ReadWriteOnce(RWO) (the volume can be mounted as read-write by a
single node) and ReadOnlyMany (ROX)(the volume can be mounted
read-only by many nodes) access modes.
Read more at : https://medium.com/#Sushil_Kumar/readwritemany-persistent-volumes-in-google-kubernetes-engine-a0b93e203180
With disk, it won't be possible to share the data between pods as it will only support the read-write once. The single disk will get attach to a single node.
If you looking forward to mounting the storage like a cloud bucket behind the POD using CSI driver, your file writing IO will be very slow. Storage can give better performance with API.
You can create the NFS server in Kubernetes and use also which will provide support again to read writ many.
Gluster FS & MinIo is one of the option to use, however if looking for managed NFS use the filestore of Google.
I would say go with the local persistent volume when you need to pass large amount of data sets which will be cost effective and efficient.
You should use Google Filestore as a file share. Then you need to:
create a Persistence Volume (PV)
create a Persistence Volume Claim (PVC)
Use the PVC with your pods
More details here

Why should I use Kubernetes Persistent Volumes instead of Volumes

To use storage inside Kubernetes PODs I can use volumes and persistent volumes. While the volumes like emptyDir are ephemeral, I could use hostPath and many other cloud based volume plugins which would provide a persistent solution in volumes itself.
In that case why should I be using Persistent Volume then?
It is very important to understand the main differences between Volumes and PersistentVolumes. Both Volumes and PersistentVolumes are Kubernetes resources which provides an abstraction of a data storage facility.
Volumes: let your pod write to a filesystem that exists as long as the pod exists. They also let you share data between containers in the same pod but data in that volume will be destroyed when the pod is restarted. Volume decouples the storage from the Container. Its lifecycle is coupled to a pod.
PersistentVolumes: serves as a long-term storage in your Kubernetes cluster. They exist beyond containers, pods, and nodes. A pod uses a persistent volume claim to to get read and write access to the persistent volume. PersistentVolume decouples the storage from the Pod. Its lifecycle is independent. It enables safe pod restarts and sharing data between pods.
When it comes to hostPath:
A hostPath volume mounts a file or directory from the host node's
filesystem into your Pod.
hostPath has its usage scenarios but in general it might not recommended due to several reasons:
Pods with identical configuration (such as created from a PodTemplate) may behave differently on different nodes due to different files on the nodes
The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume
You don't always directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.
If a node goes down you need the pod to be scheduled on other node where your locally provisioned volume will not be available.
The hostPath would be good if for example you would like to use it for log collector running in a DaemonSet.
I recommend the Kubernetes Volumes Guide as a nice supplement to this topic.
PersistentVoluemes is cluster-wide storage and allows you to manage the storage more centrally.
When you configure a volume (either using hostPath or any of the cloud-based volume plugins) then you need to do this configuration within the POD definition file. Every configuration information, required to configure storage for the volume, goes within the POD definition file.
When you have a large environment with a lot of users and a large number of PODs then users will have to configure storage every time for each POD they deploy. Whatever storage solution is used, the user who deploys the POD will have to configure that storage on all of his/her POD definition files. If a change needs to be made then the user will have to make this change on all of his/her PODs. After a certain scale, this is not the most optimal way to manage storage.
Instead, you would like to manage this centrally. You would like to manage the storage in such a way that an Administrator can create a large pool of storage and users can carve out a part of this storage as required, and this is exactly what you can do using PersistentVolumes and PersistentVolumeClaims.
Use PersistentVolumes when you need to set up a database like MongoDB, Redis, Postgres & MySQL. Because it's long-term storage and not deeply coupled with your pods! Perfect for database applications. Because they will not die with the pods.
Avoid Volumes when you need long-term storage. Because they will die with the pods!
In my case, when I have to store something, I will always go for persistent volumes!

kubernetes persistent volume for bare metal accessible on all nodes and pods

I was trying the local or host path volumes on a LAN bare metal servers.
tried local but each node was having there own copy of the data.
How can i use volumes across all the nodes and pods.
Persistent Volumes have access semantics. Example on GCE if you are using a Persistent Disk, can either be mounted as writable to a single pod or to multiple pods as read-only. If you want multi writer semantics, you need to setup NFS or some other storage that let's you write from multiple pods. NFS can support multiple read/write clients.
In case you are interested in running NFS take a look: nfs-setup.
The NFS persistent volume and NFS claim gives an indirection that allow multiple pods to refer to the NFS server using a symbolic name rather than the hardcoded server address.
Take a look: pv-multiple-pods.
If you want to share data through your cluster, then you need to use network storage.
You can't expect kubernetes to just share your data accross all the nodes of your cluster. So local storage and host path won't work in that case.
As #MaggieO said, you can setup and use a NFS server.
If you just want to try it out, you can also use your favorite cloud provider storage solution (AWS S3, GCP Bucket, Azure Disk, etc). You can see the full list here

DigitalOcean NFS vs do-block storage

I'm new to DigitalOcean and K8S and can't seem to wrap my head around this:
If I need to run multiple replica of Nginx containers, should I use block storage or NFS storage? I want static html data share by all the NGINX containers running in separate pods.
From my understanding, if I want to share data across multiple pods, I should be using NFS.
Taken from https://www.digitalocean.com/community/tutorials/how-to-set-up-readwritemany-rwx-persistent-volumes-with-nfs-on-digitalocean-kubernetes
The digitalocean-csi integrates a Kubernetes cluster with the DigitalOcean Block Storage product. A developer can use this to dynamically provision Block Storage volumes for containerized applications in Kubernetes. However, applications can sometimes require data to be persisted and shared across multiple Droplets. DigitalOcean’s default Block Storage CSI solution is unable to support mounting one block storage volume to many Droplets simultaneously. This means that this is a ReadWriteOnce (RWO) solution, since the volume is confined to one node. The Network File System (NFS) protocol, on the other hand, does support exporting the same share to many consumers. This is called ReadWriteMany (RWX), because many nodes can mount the volume as read-write. We can therefore use an NFS server within our cluster to provide storage that can leverage the reliable backing of DigitalOcean Block Storage with the flexibility of NFS shares.
Any clarification would be appreciated.