Kubernetes Persistent Volume for Pod in Production - kubernetes

I executed a scenario where I deployed Microsoft SQL Database on my K8s cluster using PV and PVC. It work well but I see some strange behaviour. I created PV but it is only visible on one node and not on other workers nodes. What am I missing here, any inputs please?
Background:
Server 1 - Master
Server 2 - Worker
Server 3 - Worker
Server 4 - Worker
Pod : "MyDb" is running on Server (Node) 4 without any replica set. I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
Please let me know your thought on this issue or share your inputs about mounting shared disk in production cluster.
Those who want to deploy SQL DB on K8s cluster, can refer blog posted by Philips. Link below,
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-1/ (without PV)
https://www.phillipsj.net/posts/sql-server-on-linux-on-kubernetes-part-2/ (with PV and Claim)
Regards,
Farooq
Please see below my findings of my original problem statement.
Problem: POD for SQL Server was created. At runtime K8s created this pod on server-4 hence created PV on server-4. However, on other node PV path wasn't created (/tmp/sqldata_.
I shutdown server-4 node and run command for deleting SQL pod (no replica was used so initially).
Status of POD changed to "Terminating" POD
Nothing happened for a while.
I restarted server-4 and noticed POD got deleted immediately.
Next Step:
- I stopped server-4 again and created same pod.
- POD was created on server-3 node at runtime and I see PV (/tmp/sqldata) was created as well on server-3. However, all my data (example samples tables) are was lost. It is fresh new PV on server 3 now.
I am assuming PV would be mounted volume of external storage and not storage/disk from any node in cluster.

I am guessing because my POD is running on server-4, PV got created on server four when created POD and refer PVC (claim) in it.
This is more or less correct and you should be able to verify this by simply deleting the Pod and recreating it (since you say you do not have a ReplicaSet doing that for you). The PersistentVolume will then be visible on the node where the Pod is scheduled to.
Edit: The above assumes that you are using an external storage provider such as NFS or AWS EBS (see possible storage providers for Kubernetes). With HostPath the above does NOT apply and a PV will be created locally on a node (and will not be mounted to another node).
There is no reason to mount the PersistentVolume also to the other nodes. Imagine having hundreds of nodes, would you want to mount your PersistentVolume to all of them, while your Pod is just running on one?
You are also asking about "shared" disks. The PersistentVolume created in the blog post you linked is using ReadWriteMany, so you actually can start multiple Pods accessing the same volume (given that your storage supports that as well). But your software (a database in your case) needs to support having multiple processes accessing the same data.
Especially when considering databases, you should also look into StatefulSets, as this basically allows you to define Pods that are always using the same storage, which can be very interesting for databases. Wherever you should run or not run databases on Kubernetes is a whole different topic...

Related

Store Winston js log files on GKE

I'm using winston js for my node js app logging. I'm deploying my dockerized app on GKE and want to store my files outside the container. Where should I store those files and what path should be mounted.
I'm really new to the kubernetes volumes and can't find the right tutorial to follow.
There are multiple options to do or follow, now first thing why you
want store logs in files ?
You can just push the logs to the stack driver service of GCP and your logs will be stored over there, in that case, no need to use any volume or extra configuration required.
Stack driver logging is managed service you can search the logs and push the logs from all the containers. In this way, your service will be stateless and you don't need to configure the volume for your deployment. Since container or POD is stateless running inside the cluster you can easily scale the application also.
Still, if you are planning to use the volume there are plenty of option with below description:
Use Node volume :
In this container will create the logs files inside Node's volume on which it's running.
Example : https://kubernetes.io/docs/concepts/storage/volumes/#hostpath
Cons :
Logs will get removed as soon as Node is auto-scaled or removed from GKE cluster during maintenance or so.
Container can schedule to any node any time, or seq of logs might create the issue.
Use PVC disk :
If you will use the Disk to store your logs for long persistent time
like 30-40 days it will work.
You create volume which will be used by POD in K8s and POD can use
this volume to create file and store and it won't get deleted from
that disk unless you do it.
Example : https://cloud.google.com/kubernetes-engine/docs/concepts/persistent-volumes
Cons :
Only a single POD (Replica) can connect with this if you are using
the access mode ReadWriteOnce, this can create issue if you are
planning to do autoscaling or want to many replicas.
NFS & EFS
You have to use the ReadWriteMany access Mode if you are planning to store all replicas or PODs logs in K8s.
For ReadWriteMany option you can use any NFS service or GCP EFS service.
in this case all your PODs will write to the Single Volume NFS or EFS system and logs will be saved there, but the extra configuration is required.
Example : https://medium.com/#Sushil_Kumar/readwritemany-persistent-volumes-in-google-kubernetes-engine-a0b93e203180
Extra :
The best option is to push logs to stack driver without during many configurations of logs and you can manage retention period over there. Just start pushing logs from the application container and you can scale replica seamlessly.

Is there an intermediate layer/cache between Kubernetes pod and Persistance volume, or does a pod access PV directly

Recently I ran into a strange problem. We have two pods running into an openshift cluster that shares a persistent volume (GlusterFs) between them.
Now for the sake of this explanation, let's assume one of the pods was PodA and the Other was PodB, in this case, PodB was running for three months, there is automation in POdA which creates/updates files in the shared persistence volume and PodB reads it and perform some operation based on the input.
Now coming to the problem, whenever POdA created a new file in the shared PV it was visible and accessible from PodA. However, there were a few files that PodA was updating periodically, but the change was not reflected in PodB. So in PodB, we could only see the old version of those files. To solve that problem, we have forcefully deleted PodB, and then openshift recreated it, and the problem was gone.
I thought in PV mechanism Kubernetes mount external storage/folder into the pod (container), and there is no intermediate layer or cache or something like that. From what we have experienced so far, it seems every container (or pod) creates a local copy of those files, or maybe there is a cache in between (PV and pod),
I have searched about this on google and could not find a detailed explanation on how this PV mount works in Kubernetes , would love to know the actual reason behind this problem.
There is no caching mechanism for PVs provided by Kubernetes, so the problem you are observing must be located in either the GlusterFS CSI driver or GlusterFS itself.

Is Kubernetes local/csi PV content synced into a new node?

According to the documentation:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned ... It is a resource in the cluster just like a node is a cluster resource...
So I was reading about all currently available plugins for PVs and I understand that for 3rd-party / out-of-cluster storage this doesn't matter (e.g. storing data in EBS, Azure or GCE disks) because there are no or very little implications when adding or removing nodes from a cluster. However, there are different ones such as (ignoring hostPath as that works only for single-node clusters):
csi
local
which (at least from what I've read in the docs) don't require 3rd-party vendors/software.
But also:
... local volumes are subject to the availability of the underlying node and are not suitable for all applications. If a node becomes unhealthy, then the local volume becomes inaccessible by the pod. The pod using this volume is unable to run. Applications using local volumes must be able to tolerate this reduced availability, as well as potential data loss, depending on the durability characteristics of the underlying disk.
The local PersistentVolume requires manual cleanup and deletion by the user if the external static provisioner is not used to manage the volume lifecycle.
Use-case
Let's say I have a single-node cluster with a single local PV and I want to add a new node to the cluster, so I have 2-node cluster (small numbers for simplicity).
Will the data from an already existing local PV be 1:1 replicated into the new node as in having one PV with 2 nodes of redundancy or is it strictly bound to the existing node only?
If the already existing PV can't be adjusted from 1 to 2 nodes, can a new PV (created from scratch) be created so it's 1:1 replicated between 2+ nodes on the cluster?
Alternatively if not, what would be the correct approach without using a 3rd-party out-of-cluster solution? Will using csi cause any change to the overall approach or is it the same with redundancy, just different "engine" under the hood?
Can a new PV be created so it's 1:1 replicated between 2+ nodes on the cluster?
None of the standard volume types are replicated at all. If you can use a volume type that supports ReadWriteMany access (most readily NFS) then multiple pods can use it simultaneously, but you would have to run the matching NFS server.
Of the volume types you reference:
hostPath is a directory on the node the pod happens to be running on. It's not a directory on any specific node, so if the pod gets recreated on a different node, it will refer to the same directory but on the new node, presumably with different content. Aside from basic test scenarios I'm not sure when a hostPath PersistentVolume would be useful.
local is a directory on a specific node, or at least following a node-affinity constraint. Kubernetes knows that not all storage can be mounted on every node, so this automatically constrains the pod to run on the node that has the directory (assuming the node still exists).
csi is an extremely generic extension mechanism, so that you can run storage drivers that aren't on the list you link to. There are some features that might be better supported by the CSI version of a storage backend than the in-tree version. (I'm familiar with AWS: the EBS CSI driver supports snapshots and resizing; the EFS CSI driver can dynamically provision NFS directories.)
In the specific case of a local test cluster (say, using kind) using a local volume will constrain pods to run on the node that has the data, which is more robust than using a hostPath volume. It won't replicate the data, though, so if the node with the data is deleted, the data goes away with it.

How do I mount data into persisted storage on Kubernetes and share the storage amongst multiple pods?

I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes

Schedule legacy applications as single instance on Kubernetes

A lot of legacy applications are deployed as containers. Most of them only need a few changes to work in a container but many of them are not built to scale, for example because they maintain session data or write to a volume (concurrency issues).
I was wondering if those applications are intended to run on Kubernetes and if so what is a good way to do so. Pods are not durable, so the desired way to start an application is by using a replication controller and setting replicas to 1. The RC ensures that the right amount of pods are running. The documentation also specifies that it kills pods if there are too many. I was wondering if that's ever the case (if a pod is not started manually).
I guess a database like Postgres (with an external data volume) is a good example. I have seen tutorials deploying those using a replication controller.
Creating a Replication Controller with 1 replica is indeed a good approach, it's more reliable than starting a single pod since you benefit from the auto-healing mechanism: in case the node your app is running on dies, your pod will be terminated an restarted somewhere else.
Data persistence in the context of a cluster management system like Kubernetes means that your data should be available outside the cluster itself (separate storage). I personally use EC2 EBS since our app runs in AWS, but Kubernetes supports a lot of other volume types. If your pod runs on node A, the volumes it uses will be mounted locally and inside your pod containers. Now if your pod is destroyed and restarted on node B this volume will be unmounted from node A and mounted on node B before the containers of your pod are recreated. Pretty neat.
Take a look at persistent volumes, this should be particularly interesting for you.