Kubernetes on GKE how to write data on PVC with readonlymany mode - kubernetes

As I want to share my PVC with multiple pods I have created a PVC with Readonlymany mode on GKE. Now how to put the data on that disk which all the other pods can use. In the documentation it says that GKE only support Readwriteonce and Readonlymany. So how to put my read-only data on the disk which other pods can use

You’ll need to seed the disk with data another way - before you mount it inside your Pods.
You could create a disk from a snapshot of another disk that you regularly update with the necessary data
This could be on a VM, or via another Pod that does the work
Read the docs on ReadOnlyMany - https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes#using_compute_engine_persistent_disks_as_readonlymany
Understanding more about what problem you want to solve will be helpful. If you truly need to write from multiple sources, then using Filestore (NFS) may address that: https://cloud.google.com/community/tutorials/gke-filestore-dynamic-provisioning

Related

Why should I use Kubernetes Persistent Volumes instead of Volumes

To use storage inside Kubernetes PODs I can use volumes and persistent volumes. While the volumes like emptyDir are ephemeral, I could use hostPath and many other cloud based volume plugins which would provide a persistent solution in volumes itself.
In that case why should I be using Persistent Volume then?
It is very important to understand the main differences between Volumes and PersistentVolumes. Both Volumes and PersistentVolumes are Kubernetes resources which provides an abstraction of a data storage facility.
Volumes: let your pod write to a filesystem that exists as long as the pod exists. They also let you share data between containers in the same pod but data in that volume will be destroyed when the pod is restarted. Volume decouples the storage from the Container. Its lifecycle is coupled to a pod.
PersistentVolumes: serves as a long-term storage in your Kubernetes cluster. They exist beyond containers, pods, and nodes. A pod uses a persistent volume claim to to get read and write access to the persistent volume. PersistentVolume decouples the storage from the Pod. Its lifecycle is independent. It enables safe pod restarts and sharing data between pods.
When it comes to hostPath:
A hostPath volume mounts a file or directory from the host node's
filesystem into your Pod.
hostPath has its usage scenarios but in general it might not recommended due to several reasons:
Pods with identical configuration (such as created from a PodTemplate) may behave differently on different nodes due to different files on the nodes
The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume
You don't always directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.
If a node goes down you need the pod to be scheduled on other node where your locally provisioned volume will not be available.
The hostPath would be good if for example you would like to use it for log collector running in a DaemonSet.
I recommend the Kubernetes Volumes Guide as a nice supplement to this topic.
PersistentVoluemes is cluster-wide storage and allows you to manage the storage more centrally.
When you configure a volume (either using hostPath or any of the cloud-based volume plugins) then you need to do this configuration within the POD definition file. Every configuration information, required to configure storage for the volume, goes within the POD definition file.
When you have a large environment with a lot of users and a large number of PODs then users will have to configure storage every time for each POD they deploy. Whatever storage solution is used, the user who deploys the POD will have to configure that storage on all of his/her POD definition files. If a change needs to be made then the user will have to make this change on all of his/her PODs. After a certain scale, this is not the most optimal way to manage storage.
Instead, you would like to manage this centrally. You would like to manage the storage in such a way that an Administrator can create a large pool of storage and users can carve out a part of this storage as required, and this is exactly what you can do using PersistentVolumes and PersistentVolumeClaims.
Use PersistentVolumes when you need to set up a database like MongoDB, Redis, Postgres & MySQL. Because it's long-term storage and not deeply coupled with your pods! Perfect for database applications. Because they will not die with the pods.
Avoid Volumes when you need long-term storage. Because they will die with the pods!
In my case, when I have to store something, I will always go for persistent volumes!

minio for mariadb in kubernetes

I'm running a k3s single node cluster and have the k3s local-path-provisioner as storage. As I want to be able to add nodes in the future, I looked at minio to use on top of the local-path as storage. But I'm not sure if it's the right choice, cause I my workloads primarily use mariadb for data and I read, that an s3 compatible bucket isn't the best for database applications.
I hope you can help me figure this out.
If you don't want to use object storage then here are your options for running a local storage provisioner:
GlusterFS StorageClass
Doesn't have lot of documentation on how to set it up. But if you know your way around GlusterFS It'll be a good option.
local-path-provisioner
I
t provides a way for the Kubernetes users to utilize the local storage in each node
OpenEBS -> has a local volume storage engine but I think this is not designed to work on a shared volume mount and it end up tying a pod to a specific node since the data "doesn't exist" on the other nodes.
longhorn [recommened]
It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
rook
Rook is a storage operators for Kubernetes, It supports multiple storage backends. Don't use the NFS one tho cause we hit a wall when using it with our DBs.

Installing Postgres in GKE as NFS with multiple micro-services deployed

I have a GKE cluster, with almost 6-7 micro-services deployed. I need a Postgres DB to be installed inside GKE (not Cloudsql as cost). When checked the different types of persistent volumes i can see that if multiple micro-service accessing the same DB, should i go using NFS or PVC with normal disk would be enough not anyway local storage.
Request your thought on this.
Everything depends from your scenario. In general you should follow AccessMode when you are considering which Volume Plugin you want to use.
A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume.
In this documentation below, you will find table with different Volume Plugins and supported Access Modes.
According to update form your comment, you have only one node. With that setup, you can use almost every Volume which supports RWO Access mode.
ReadWriteOnce -- the volume can be mounted as read-write by a single node.
There are 2 other Access Modes which should be consider if would like to use it for more than 1 node.
ReadOnlyMany -- the volume can be mounted read-only by many nodes
ReadWriteMany -- the volume can be mounted as read-write by many nodes
So in your case you can use gcePersistentDisk as it supports (ReadWriteOnce and ReadOnlyMany).
Using NFS would benefit if you would like to access this PV from many nodes.
NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
Just as addition, if this is for learning puropse, you can also check Local Persistent Volume. Example can be found in this tutorial, however it would require few updates like image or apiVersion.

Kubernetes shared persistent volume between multiple nodes

I am currently looking into setting up Kubernetes pods for a project on GCP.
The problem - I need to set a persistent shared volume which will be used by multiple nodes. I need all nodes to be able to read from the volume and only one node must be able to write on the volume. So I need some advice what's the best way to achieve that?
I have checked the Kubernetes documentation and know that GCEPersistentDisks does not support ReadWriteMany but anyway this access mode I think will be an overkill. Regarding the ReadOnlyMany I get that nodes can read from the PV but I don't understand how or what can actually modify the PV in this case. Currently my best bet is setting up NFS with GCE persistent disk.
Also the solution should be able to run on the cloud or on premise. Any advice will be appreciated :)
According to official documentation:
A PVC to PV binding is a one-to-one mapping.
A volume can only be mounted using one access mode at a time, even if
it supports many. For example, a GCEPersistentDisk can be mounted as
ReadWriteOnce by a single node or ReadOnlyMany by many nodes, but not
at the same time.
So I am afraid that this would be impossible to do it how you described.
However, you may want to try task queue
Please let me know if that helped.
Assuming there is some NFS space availalbe, it could be possible to create two persistent volume claims (PVCs) for it: one readonly, one readwrite.
Then you could have two persistent volumes binding one to one to each of the PVCs.
Now create two deployments. One of them describes the pod with the writing application, and you ensure you have one replica. The other deployment describes the reading application, and you can scale it to whatever amount you like.

Cannot use existing persistentVolumes that already used by another nodes in Kubernetes Google Compute Platform

I tried to remain on the free-tier of google cloud platform and it only permits 3 nodes and 30Gb of Storage in which where the cluster created, each nodes are mapped to each storage 10Gb each.
And when I tried to mount persistentVolume and Claims to existing Disks, the error shows :
Attach failed for volume "myapp-pv" : googleapi: Error 400: The disk resource 'projects/myapp-dev/zones/us-central1-a/disks/gke-myapp-dev-clus-default-pool-64e30c4b-dvkc' is already being used by 'projects/myapp-dev/zones/us-central1-a/instances/gke-myapp-dev-clus-default-pool-64e30c4b-dvkc
The working sollution is for me to create another disks, but the problem is it is out of the free-tier, I wonder how can we stay in free-tier without creating another persistentDisk in GCP ?
And when I tried to mount persistentVolume and Claims to existing Disks, the error shows
This error is happening because of this constraint of PV on GCE:
Important! A volume can only be mounted using one access mode at a time,
even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce
by a single node or ReadOnlyMany by many nodes, but not at the same time.
Table given in above link shown that GCEPersistentDisk can't be mounted as ReadWriteMany so if you need to connect it in that way you have to use some other volume plugin.
I wonder how can we stay in free-tier without creating another persistentDisk in GCP?
Just some thouhgts... With free-tier you are limited in a number of nodes and disk space available:
You can always 'simulate' ReadWriteMany with NFS volume plugin for example (installing your own provisioner for NFS) providing that your use case is not excluding NFS usage. Dowside is that you need to install NFS provisioner (squeeze it in you capacity) and it is not really well suited for fast io (database and stuff)
You can use hostPath on each of the nodes and manually juggle pods around, but that is prone to data loss and not really a proper kubernetes approach to PV handling. This is something to consider if you need fast io (you are testing with databases) and proper backup should be in place to avoid data loss if node dies.