minio for mariadb in kubernetes - kubernetes

I'm running a k3s single node cluster and have the k3s local-path-provisioner as storage. As I want to be able to add nodes in the future, I looked at minio to use on top of the local-path as storage. But I'm not sure if it's the right choice, cause I my workloads primarily use mariadb for data and I read, that an s3 compatible bucket isn't the best for database applications.
I hope you can help me figure this out.

If you don't want to use object storage then here are your options for running a local storage provisioner:
GlusterFS StorageClass
Doesn't have lot of documentation on how to set it up. But if you know your way around GlusterFS It'll be a good option.
local-path-provisioner
I
t provides a way for the Kubernetes users to utilize the local storage in each node
OpenEBS -> has a local volume storage engine but I think this is not designed to work on a shared volume mount and it end up tying a pod to a specific node since the data "doesn't exist" on the other nodes.
longhorn [recommened]
It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
rook
Rook is a storage operators for Kubernetes, It supports multiple storage backends. Don't use the NFS one tho cause we hit a wall when using it with our DBs.

Related

Is Kubernetes local/csi PV content synced into a new node?

According to the documentation:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned ... It is a resource in the cluster just like a node is a cluster resource...
So I was reading about all currently available plugins for PVs and I understand that for 3rd-party / out-of-cluster storage this doesn't matter (e.g. storing data in EBS, Azure or GCE disks) because there are no or very little implications when adding or removing nodes from a cluster. However, there are different ones such as (ignoring hostPath as that works only for single-node clusters):
csi
local
which (at least from what I've read in the docs) don't require 3rd-party vendors/software.
But also:
... local volumes are subject to the availability of the underlying node and are not suitable for all applications. If a node becomes unhealthy, then the local volume becomes inaccessible by the pod. The pod using this volume is unable to run. Applications using local volumes must be able to tolerate this reduced availability, as well as potential data loss, depending on the durability characteristics of the underlying disk.
The local PersistentVolume requires manual cleanup and deletion by the user if the external static provisioner is not used to manage the volume lifecycle.
Use-case
Let's say I have a single-node cluster with a single local PV and I want to add a new node to the cluster, so I have 2-node cluster (small numbers for simplicity).
Will the data from an already existing local PV be 1:1 replicated into the new node as in having one PV with 2 nodes of redundancy or is it strictly bound to the existing node only?
If the already existing PV can't be adjusted from 1 to 2 nodes, can a new PV (created from scratch) be created so it's 1:1 replicated between 2+ nodes on the cluster?
Alternatively if not, what would be the correct approach without using a 3rd-party out-of-cluster solution? Will using csi cause any change to the overall approach or is it the same with redundancy, just different "engine" under the hood?
Can a new PV be created so it's 1:1 replicated between 2+ nodes on the cluster?
None of the standard volume types are replicated at all. If you can use a volume type that supports ReadWriteMany access (most readily NFS) then multiple pods can use it simultaneously, but you would have to run the matching NFS server.
Of the volume types you reference:
hostPath is a directory on the node the pod happens to be running on. It's not a directory on any specific node, so if the pod gets recreated on a different node, it will refer to the same directory but on the new node, presumably with different content. Aside from basic test scenarios I'm not sure when a hostPath PersistentVolume would be useful.
local is a directory on a specific node, or at least following a node-affinity constraint. Kubernetes knows that not all storage can be mounted on every node, so this automatically constrains the pod to run on the node that has the directory (assuming the node still exists).
csi is an extremely generic extension mechanism, so that you can run storage drivers that aren't on the list you link to. There are some features that might be better supported by the CSI version of a storage backend than the in-tree version. (I'm familiar with AWS: the EBS CSI driver supports snapshots and resizing; the EFS CSI driver can dynamically provision NFS directories.)
In the specific case of a local test cluster (say, using kind) using a local volume will constrain pods to run on the node that has the data, which is more robust than using a hostPath volume. It won't replicate the data, though, so if the node with the data is deleted, the data goes away with it.

Are Pods forced to run on nodes where their persistent volumes exist?

I'm teaching myself Kubernetes with a 5 Rpi cluster, and I'm a bit confused by the way Kubernetes treats Persistent Volumes with respect to Pod Scheduling.
I have 4 worker nodes using ext4 formatted 64GB micro SD cards. It's not going to give GCP or AWS a run for their money, but it's a side project.
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
Should I be looking into distributed file systems like Ceph or Hdfs so that Pods aren't restricted to being scheduled on a particular node?
Sorry if this seems like a stupid question, I'm self taught and still trying to figure this stuff out! (Feel free to improve my tl;dr doc for kubernetes with a pull req)
just some examples, as already mentioned it depends on your storage system, as i see you use the local storage option
Local Storage:
yes the pod needs to be run on the same machine where the pv is located (your case)
ISCSI/Trident San:
no, the node will mount the iscsi block device where the pod will be scheduled
(as mentioned already volume binding mode is an important keyword, its possible you need to set this to 'WaitForFirstConsumer')
NFS/Trident Nas:
no, its nfs, mountable from everywhere if you can access and auth against it
VMWare VMDK's:
no, same as iscsi, the node which gets the pod scheduled mounts the vmdk from the datastore
ceph/rook.io:
no, you get 3 options for storage, file, block an object storage, every type is distributed so you can schedule a pod on every node.
also ceph is the ideal system for carrying a distributed software defined storage on commodity hardware, what i can recommend is https://rook.io/ basically an opensource ceph on 'container-steroids'
Let's say I create a Persistent volume Claim requesting 10GB of storage on worker1, and I deploy a service which relies on this PVC, is that service then forced to be scheduled on worker1?
This is a good question. How this works depends on your storage system. The StorageClass defined for your Persistent Volume Claim contains information about Volume Binding Mode. It is common to use dynamic provisioning volumes, so that the volume is first allocated when a user/consumer/Pod is scheduled. And typically this volume does not exist on the local Node but remote in the same data center. Kubernetes also has support for Local Persistent Volumes that are physical volumes located on the same Node, but they are typically more expensive and used when you need high disk performance and volume.

Kubernetes cluster Mysql Nodes Storage

We have started setting up a Kubernetes cluster. On Production, we have 4 Mysql Nodes(2 Active Master, 2 Active slaves). Complete servers are on-premise, There is NO cloud providers usage.
Now how do I configure storage? I mean should I use PV / PVC? How will it work. Should I use local PV? Can someone explain to me this?
You need to use PersistentVolumes and PersistentVolumeClaims in order to achieve that.
A PersistentVolume (PV) is a piece of storage in the cluster that has
been provisioned by an administrator or dynamically provisioned using
Storage Classes.
A PersistentVolumeClaim (PVC) is a request for storage by a user.
Claims can request specific size and access modes (e.g., they can be
mounted once read/write or many times read-only).
Containers are ephemeral. When the container is restarted all the changes made prior to it are lost. Databases, however expect the data is persistent, therefore you need persistent volumes. You have to create a storage claim and the pod must be configured to mount the claimed storage.
Here you will find a simple guide showing how to deploy MySQL with a PersistentVolume. However, I strongly recommend getting familiar with the official docs that I have linked in order to fully understand the concept and adjust the access mode, class, size, etc according to your needs.
Please let me know if that helped.

DigitalOcean NFS vs do-block storage

I'm new to DigitalOcean and K8S and can't seem to wrap my head around this:
If I need to run multiple replica of Nginx containers, should I use block storage or NFS storage? I want static html data share by all the NGINX containers running in separate pods.
From my understanding, if I want to share data across multiple pods, I should be using NFS.
Taken from https://www.digitalocean.com/community/tutorials/how-to-set-up-readwritemany-rwx-persistent-volumes-with-nfs-on-digitalocean-kubernetes
The digitalocean-csi integrates a Kubernetes cluster with the DigitalOcean Block Storage product. A developer can use this to dynamically provision Block Storage volumes for containerized applications in Kubernetes. However, applications can sometimes require data to be persisted and shared across multiple Droplets. DigitalOcean’s default Block Storage CSI solution is unable to support mounting one block storage volume to many Droplets simultaneously. This means that this is a ReadWriteOnce (RWO) solution, since the volume is confined to one node. The Network File System (NFS) protocol, on the other hand, does support exporting the same share to many consumers. This is called ReadWriteMany (RWX), because many nodes can mount the volume as read-write. We can therefore use an NFS server within our cluster to provide storage that can leverage the reliable backing of DigitalOcean Block Storage with the flexibility of NFS shares.
Any clarification would be appreciated.

How do I mount data into persisted storage on Kubernetes and share the storage amongst multiple pods?

I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes