Installing Postgres in GKE as NFS with multiple micro-services deployed - kubernetes

I have a GKE cluster, with almost 6-7 micro-services deployed. I need a Postgres DB to be installed inside GKE (not Cloudsql as cost). When checked the different types of persistent volumes i can see that if multiple micro-service accessing the same DB, should i go using NFS or PVC with normal disk would be enough not anyway local storage.
Request your thought on this.

Everything depends from your scenario. In general you should follow AccessMode when you are considering which Volume Plugin you want to use.
A PersistentVolume can be mounted on a host in any way supported by the resource provider. As shown in the table below, providers will have different capabilities and each PV's access modes are set to the specific modes supported by that particular volume.
In this documentation below, you will find table with different Volume Plugins and supported Access Modes.
According to update form your comment, you have only one node. With that setup, you can use almost every Volume which supports RWO Access mode.
ReadWriteOnce -- the volume can be mounted as read-write by a single node.
There are 2 other Access Modes which should be consider if would like to use it for more than 1 node.
ReadOnlyMany -- the volume can be mounted read-only by many nodes
ReadWriteMany -- the volume can be mounted as read-write by many nodes
So in your case you can use gcePersistentDisk as it supports (ReadWriteOnce and ReadOnlyMany).
Using NFS would benefit if you would like to access this PV from many nodes.
NFS can support multiple read/write clients, but a specific NFS PV might be exported on the server as read-only. Each PV gets its own set of access modes describing that specific PV's capabilities.
Just as addition, if this is for learning puropse, you can also check Local Persistent Volume. Example can be found in this tutorial, however it would require few updates like image or apiVersion.

Related

Why should I use Kubernetes Persistent Volumes instead of Volumes

To use storage inside Kubernetes PODs I can use volumes and persistent volumes. While the volumes like emptyDir are ephemeral, I could use hostPath and many other cloud based volume plugins which would provide a persistent solution in volumes itself.
In that case why should I be using Persistent Volume then?
It is very important to understand the main differences between Volumes and PersistentVolumes. Both Volumes and PersistentVolumes are Kubernetes resources which provides an abstraction of a data storage facility.
Volumes: let your pod write to a filesystem that exists as long as the pod exists. They also let you share data between containers in the same pod but data in that volume will be destroyed when the pod is restarted. Volume decouples the storage from the Container. Its lifecycle is coupled to a pod.
PersistentVolumes: serves as a long-term storage in your Kubernetes cluster. They exist beyond containers, pods, and nodes. A pod uses a persistent volume claim to to get read and write access to the persistent volume. PersistentVolume decouples the storage from the Pod. Its lifecycle is independent. It enables safe pod restarts and sharing data between pods.
When it comes to hostPath:
A hostPath volume mounts a file or directory from the host node's
filesystem into your Pod.
hostPath has its usage scenarios but in general it might not recommended due to several reasons:
Pods with identical configuration (such as created from a PodTemplate) may behave differently on different nodes due to different files on the nodes
The files or directories created on the underlying hosts are only writable by root. You either need to run your process as root in a privileged Container or modify the file permissions on the host to be able to write to a hostPath volume
You don't always directly control which node your pods will run on, so you're not guaranteed that the pod will actually be scheduled on the node that has the data volume.
If a node goes down you need the pod to be scheduled on other node where your locally provisioned volume will not be available.
The hostPath would be good if for example you would like to use it for log collector running in a DaemonSet.
I recommend the Kubernetes Volumes Guide as a nice supplement to this topic.
PersistentVoluemes is cluster-wide storage and allows you to manage the storage more centrally.
When you configure a volume (either using hostPath or any of the cloud-based volume plugins) then you need to do this configuration within the POD definition file. Every configuration information, required to configure storage for the volume, goes within the POD definition file.
When you have a large environment with a lot of users and a large number of PODs then users will have to configure storage every time for each POD they deploy. Whatever storage solution is used, the user who deploys the POD will have to configure that storage on all of his/her POD definition files. If a change needs to be made then the user will have to make this change on all of his/her PODs. After a certain scale, this is not the most optimal way to manage storage.
Instead, you would like to manage this centrally. You would like to manage the storage in such a way that an Administrator can create a large pool of storage and users can carve out a part of this storage as required, and this is exactly what you can do using PersistentVolumes and PersistentVolumeClaims.
Use PersistentVolumes when you need to set up a database like MongoDB, Redis, Postgres & MySQL. Because it's long-term storage and not deeply coupled with your pods! Perfect for database applications. Because they will not die with the pods.
Avoid Volumes when you need long-term storage. Because they will die with the pods!
In my case, when I have to store something, I will always go for persistent volumes!

Is Kubernetes local/csi PV content synced into a new node?

According to the documentation:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned ... It is a resource in the cluster just like a node is a cluster resource...
So I was reading about all currently available plugins for PVs and I understand that for 3rd-party / out-of-cluster storage this doesn't matter (e.g. storing data in EBS, Azure or GCE disks) because there are no or very little implications when adding or removing nodes from a cluster. However, there are different ones such as (ignoring hostPath as that works only for single-node clusters):
csi
local
which (at least from what I've read in the docs) don't require 3rd-party vendors/software.
But also:
... local volumes are subject to the availability of the underlying node and are not suitable for all applications. If a node becomes unhealthy, then the local volume becomes inaccessible by the pod. The pod using this volume is unable to run. Applications using local volumes must be able to tolerate this reduced availability, as well as potential data loss, depending on the durability characteristics of the underlying disk.
The local PersistentVolume requires manual cleanup and deletion by the user if the external static provisioner is not used to manage the volume lifecycle.
Use-case
Let's say I have a single-node cluster with a single local PV and I want to add a new node to the cluster, so I have 2-node cluster (small numbers for simplicity).
Will the data from an already existing local PV be 1:1 replicated into the new node as in having one PV with 2 nodes of redundancy or is it strictly bound to the existing node only?
If the already existing PV can't be adjusted from 1 to 2 nodes, can a new PV (created from scratch) be created so it's 1:1 replicated between 2+ nodes on the cluster?
Alternatively if not, what would be the correct approach without using a 3rd-party out-of-cluster solution? Will using csi cause any change to the overall approach or is it the same with redundancy, just different "engine" under the hood?
Can a new PV be created so it's 1:1 replicated between 2+ nodes on the cluster?
None of the standard volume types are replicated at all. If you can use a volume type that supports ReadWriteMany access (most readily NFS) then multiple pods can use it simultaneously, but you would have to run the matching NFS server.
Of the volume types you reference:
hostPath is a directory on the node the pod happens to be running on. It's not a directory on any specific node, so if the pod gets recreated on a different node, it will refer to the same directory but on the new node, presumably with different content. Aside from basic test scenarios I'm not sure when a hostPath PersistentVolume would be useful.
local is a directory on a specific node, or at least following a node-affinity constraint. Kubernetes knows that not all storage can be mounted on every node, so this automatically constrains the pod to run on the node that has the directory (assuming the node still exists).
csi is an extremely generic extension mechanism, so that you can run storage drivers that aren't on the list you link to. There are some features that might be better supported by the CSI version of a storage backend than the in-tree version. (I'm familiar with AWS: the EBS CSI driver supports snapshots and resizing; the EFS CSI driver can dynamically provision NFS directories.)
In the specific case of a local test cluster (say, using kind) using a local volume will constrain pods to run on the node that has the data, which is more robust than using a hostPath volume. It won't replicate the data, though, so if the node with the data is deleted, the data goes away with it.

Kubernetes cluster Mysql Nodes Storage

We have started setting up a Kubernetes cluster. On Production, we have 4 Mysql Nodes(2 Active Master, 2 Active slaves). Complete servers are on-premise, There is NO cloud providers usage.
Now how do I configure storage? I mean should I use PV / PVC? How will it work. Should I use local PV? Can someone explain to me this?
You need to use PersistentVolumes and PersistentVolumeClaims in order to achieve that.
A PersistentVolume (PV) is a piece of storage in the cluster that has
been provisioned by an administrator or dynamically provisioned using
Storage Classes.
A PersistentVolumeClaim (PVC) is a request for storage by a user.
Claims can request specific size and access modes (e.g., they can be
mounted once read/write or many times read-only).
Containers are ephemeral. When the container is restarted all the changes made prior to it are lost. Databases, however expect the data is persistent, therefore you need persistent volumes. You have to create a storage claim and the pod must be configured to mount the claimed storage.
Here you will find a simple guide showing how to deploy MySQL with a PersistentVolume. However, I strongly recommend getting familiar with the official docs that I have linked in order to fully understand the concept and adjust the access mode, class, size, etc according to your needs.
Please let me know if that helped.

DigitalOcean NFS vs do-block storage

I'm new to DigitalOcean and K8S and can't seem to wrap my head around this:
If I need to run multiple replica of Nginx containers, should I use block storage or NFS storage? I want static html data share by all the NGINX containers running in separate pods.
From my understanding, if I want to share data across multiple pods, I should be using NFS.
Taken from https://www.digitalocean.com/community/tutorials/how-to-set-up-readwritemany-rwx-persistent-volumes-with-nfs-on-digitalocean-kubernetes
The digitalocean-csi integrates a Kubernetes cluster with the DigitalOcean Block Storage product. A developer can use this to dynamically provision Block Storage volumes for containerized applications in Kubernetes. However, applications can sometimes require data to be persisted and shared across multiple Droplets. DigitalOcean’s default Block Storage CSI solution is unable to support mounting one block storage volume to many Droplets simultaneously. This means that this is a ReadWriteOnce (RWO) solution, since the volume is confined to one node. The Network File System (NFS) protocol, on the other hand, does support exporting the same share to many consumers. This is called ReadWriteMany (RWX), because many nodes can mount the volume as read-write. We can therefore use an NFS server within our cluster to provide storage that can leverage the reliable backing of DigitalOcean Block Storage with the flexibility of NFS shares.
Any clarification would be appreciated.

Cannot use existing persistentVolumes that already used by another nodes in Kubernetes Google Compute Platform

I tried to remain on the free-tier of google cloud platform and it only permits 3 nodes and 30Gb of Storage in which where the cluster created, each nodes are mapped to each storage 10Gb each.
And when I tried to mount persistentVolume and Claims to existing Disks, the error shows :
Attach failed for volume "myapp-pv" : googleapi: Error 400: The disk resource 'projects/myapp-dev/zones/us-central1-a/disks/gke-myapp-dev-clus-default-pool-64e30c4b-dvkc' is already being used by 'projects/myapp-dev/zones/us-central1-a/instances/gke-myapp-dev-clus-default-pool-64e30c4b-dvkc
The working sollution is for me to create another disks, but the problem is it is out of the free-tier, I wonder how can we stay in free-tier without creating another persistentDisk in GCP ?
And when I tried to mount persistentVolume and Claims to existing Disks, the error shows
This error is happening because of this constraint of PV on GCE:
Important! A volume can only be mounted using one access mode at a time,
even if it supports many. For example, a GCEPersistentDisk can be mounted as ReadWriteOnce
by a single node or ReadOnlyMany by many nodes, but not at the same time.
Table given in above link shown that GCEPersistentDisk can't be mounted as ReadWriteMany so if you need to connect it in that way you have to use some other volume plugin.
I wonder how can we stay in free-tier without creating another persistentDisk in GCP?
Just some thouhgts... With free-tier you are limited in a number of nodes and disk space available:
You can always 'simulate' ReadWriteMany with NFS volume plugin for example (installing your own provisioner for NFS) providing that your use case is not excluding NFS usage. Dowside is that you need to install NFS provisioner (squeeze it in you capacity) and it is not really well suited for fast io (database and stuff)
You can use hostPath on each of the nodes and manually juggle pods around, but that is prone to data loss and not really a proper kubernetes approach to PV handling. This is something to consider if you need fast io (you are testing with databases) and proper backup should be in place to avoid data loss if node dies.