minIO on Kubernetes : Best way to back it up - kubernetes

I have tried to deploy so far in the below 2 ways:
1. Deploy it as a statefulset on Kubernetes and using persistent
volume of nfs as its storage . But then I learnt that we shouldn't
be running minio backed by NAS as erasure code should not be run on NAS data.
2. Deploy it as Daemonset using local volumes. This time I attached
separate disks to my nodes and labelled them such
that minio is scheduled to run on these nodes. The disks are mounted on
/data/minio on each of the nodes
But now everyday, the nodes are coming under disk pressure and minio pods are
getting evicted. When I check the kubelet logs:
Aug 13 21:05:45 staging-node2 kubelet[2188]: I0813 21:05:45.968179 2188 kubelet_pods.go:1073] Killing unwanted pod "minio-kjrkc"
Aug 13 21:05:45 staging-node2 kubelet[2188]: I0813 21:05:45.975372 2188 kuberuntime_container.go:559] Killing container "docker://6da1247718f8e6c92399e231f8c31ff1c510737c658ac2aca87c1659aa6b51cc" with 30 second grace period
It tries to kill the pods but the container never dies. Even if minio gets TERMINATED signal, container is still up.
What other options are left for an on-prem minio installations?
Is using local storage somehow not recommended or am I using it incorrectly ?
Any idea if I have to explicitly configure any pre-hook for minio to accept the terminate signal ?

Right approach to deploy MinIO on Kubernetes is using local volumes on multiple distributed nodes using a StatefulSet. You can do this via Yaml files or Helm Chart or via our MinIO Operator. Docs are available here https://github.com/minio/minio/tree/master/docs/orchestration/kubernetes
What other options are left for an on-prem minio installations?
A StatefulSet with atleast 4 Pods and storage via hostPath volumes is generally the right way to deploy. Depending on the use case and existing infrastructure there may be other relevant approaches too.
Is using local storage somehow not recommended or am I using it incorrectly ?
Local storage is recommended. I'll need more info to see what exactly is going wrong here, but if you have dedicated local storage drives, you should be fine.
Any idea if I have to explicitly configure any pre-hook for minio to accept the terminate signal ?
MinIO handles signals to terminate etc. There is no pre-hook needed as such.
Hope this helps, please join us on our Slack channel # https://slack.min.io for detailed discussion etc.

Related

Store Winston js log files on GKE

I'm using winston js for my node js app logging. I'm deploying my dockerized app on GKE and want to store my files outside the container. Where should I store those files and what path should be mounted.
I'm really new to the kubernetes volumes and can't find the right tutorial to follow.
There are multiple options to do or follow, now first thing why you
want store logs in files ?
You can just push the logs to the stack driver service of GCP and your logs will be stored over there, in that case, no need to use any volume or extra configuration required.
Stack driver logging is managed service you can search the logs and push the logs from all the containers. In this way, your service will be stateless and you don't need to configure the volume for your deployment. Since container or POD is stateless running inside the cluster you can easily scale the application also.
Still, if you are planning to use the volume there are plenty of option with below description:
Use Node volume :
In this container will create the logs files inside Node's volume on which it's running.
Example : https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
Cons :
Logs will get removed as soon as Node is auto-scaled or removed from GKE cluster during maintenance or so.
Container can schedule to any node any time, or seq of logs might create the issue.
Use PVC disk :
If you will use the Disk to store your logs for long persistent time
like 30-40 days it will work.
You create volume which will be used by POD in K8s and POD can use
this volume to create file and store and it won't get deleted from
that disk unless you do it.
Example : https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes
Cons :
Only a single POD (Replica) can connect with this if you are using
the access mode ReadWriteOnce, this can create issue if you are
planning to do autoscaling or want to many replicas.
NFS & EFS
You have to use the ReadWriteMany access Mode if you are planning to store all replicas or PODs logs in K8s.
For ReadWriteMany option you can use any NFS service or GCP EFS service.
in this case all your PODs will write to the Single Volume NFS or EFS system and logs will be saved there, but the extra configuration is required.
Example : https://medium.com/#Sushil_Kumar/readwritemany-persistent-volumes-in-google-kubernetes-engine-a0b93e203180
Extra :
The best option is to push logs to stack driver without during many configurations of logs and you can manage retention period over there. Just start pushing logs from the application container and you can scale replica seamlessly.

minio for mariadb in kubernetes

I'm running a k3s single node cluster and have the k3s local-path-provisioner as storage. As I want to be able to add nodes in the future, I looked at minio to use on top of the local-path as storage. But I'm not sure if it's the right choice, cause I my workloads primarily use mariadb for data and I read, that an s3 compatible bucket isn't the best for database applications.
I hope you can help me figure this out.
If you don't want to use object storage then here are your options for running a local storage provisioner:
GlusterFS StorageClass
Doesn't have lot of documentation on how to set it up. But if you know your way around GlusterFS It'll be a good option.
local-path-provisioner
I
t provides a way for the Kubernetes users to utilize the local storage in each node
OpenEBS -> has a local volume storage engine but I think this is not designed to work on a shared volume mount and it end up tying a pod to a specific node since the data "doesn't exist" on the other nodes.
longhorn [recommened]
It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
rook
Rook is a storage operators for Kubernetes, It supports multiple storage backends. Don't use the NFS one tho cause we hit a wall when using it with our DBs.

Kubernetes Persistent Volume for Pod in Production

I executed a scenario where I deployed Microsoft SQL Database on my K8s cluster using PV and PVC. It work well but I see some strange behaviour. I created PV but it is only visible on one node and not on other workers nodes. What am I missing here, any inputs please?
Background:
Server 1 - Master
Server 2 - Worker
Server 3 - Worker
Server 4 - Worker
Pod : "MyDb" is running on Server (Node) 4 without any replica set. I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
Please let me know your thought on this issue or share your inputs about mounting shared disk in production cluster.
Those who want to deploy SQL DB on K8s cluster, can refer blog posted by Philips. Link below,
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-1/ (without PV)
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-2/ (with PV and Claim)
Regards,
Farooq
Please see below my findings of my original problem statement.
Problem: POD for SQL Server was created. At runtime K8s created this pod on server-4 hence created PV on server-4. However, on other node PV path wasn't created (/tmp/sqldata_.
I shutdown server-4 node and run command for deleting SQL pod (no replica was used so initially).
Status of POD changed to "Terminating" POD
Nothing happened for a while.
I restarted server-4 and noticed POD got deleted immediately.
Next Step:
- I stopped server-4 again and created same pod.
- POD was created on server-3 node at runtime and I see PV (/tmp/sqldata) was created as well on server-3. However, all my data (example samples tables) are was lost. It is fresh new PV on server 3 now.
I am assuming PV would be mounted volume of external storage and not storage/disk from any node in cluster.
I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
This is more or less correct and you should be able to verify this by simply deleting the Pod and recreating it (since you say you do not have a ReplicaSet doing that for you). The PersistentVolume will then be visible on the node where the Pod is scheduled to.
Edit: The above assumes that you are using an external storage provider such as NFS or AWS EBS (see possible storage providers for Kubernetes). With HostPath the above does NOT apply and a PV will be created locally on a node (and will not be mounted to another node).
There is no reason to mount the PersistentVolume also to the other nodes. Imagine having hundreds of nodes, would you want to mount your PersistentVolume to all of them, while your Pod is just running on one?
You are also asking about "shared" disks. The PersistentVolume created in the blog post you linked is using ReadWriteMany, so you actually can start multiple Pods accessing the same volume (given that your storage supports that as well). But your software (a database in your case) needs to support having multiple processes accessing the same data.
Especially when considering databases, you should also look into StatefulSets, as this basically allows you to define Pods that are always using the same storage, which can be very interesting for databases. Wherever you should run or not run databases on Kubernetes is a whole different topic...

How do I mount data into persisted storage on Kubernetes and share the storage amongst multiple pods?

I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

Kubernetes - Persistent storage for PostgreSQL

We currently have a 2-node Kubernetes environment running on bare-metal machines (no GCE) and now we wish to set up a PostgreSQL instance on top of this.
Our plan was to map a data volume for the PostgreSQL Data Directory to the node using the volumeMounts option in Kubernetes. However this would be a problem because if the Pod ever gets stopped, Kubernetes will re-launch it at random on one of the other nodes. Thus we have no guarantee that it will use the correct data directory on re-launch...
So what is the best approach for maintaining a consistent and persistent PostgreSQL Data Directory across a Kubernetes cluster?
one solution is to deploy HA postgresql, for example https://github.com/sorintlab/stolon
another is to have some network storage attached to all nodes(NFS, glusterFS) and use volumeMounts in the pods