Kubernetes: strategy for out-of-cluster persistent storage

Kubernetes: strategy for out-of-cluster persistent storage - kubernetes

I need a piece of advice / recommendation / link to tutorial.
I am designing a kubernetes cluster and one of the projects is a Wordpress site with lots of pictures (photo blog).
I want to be able to tear down and re-create my cluster within an hour, so all "persistent" pieces need to be hosted outside of cluster (say, separate linux instance).
It is doable with database - I will just have a MySQL server running on that machine and will update WP configs accordingly.
It is not trivial with filesystem storage. I am looking at Kubernetes volume providers, specifically NFS. I want to setup NFS server on a separate machine and have each WP pod use that NFS share through volume mechanism. In that case, I can rebuild my cluster any time and data will be preserved. Almost like database access, but filesystem.
The questions are the following. Does this solution seem feasible? Is there any better way to achieve same goal? Does Kubernetes NFS plugin support the functionality I need? What about authorization?

so I am using a very similar strategy for my cluster where all my PVC are placed on a standalone VM instance with a static IP and which has an NFS-server running and a simple nfs-client-provisioner helm chart on my cluster.
So here what I did :
Created a server(Ubuntu) and installed the NFS server on it. Reference here
Install a helm chart/app for nfs-client-proviosner with parameters.
nfs.path = /srv ( the path on server which is allocated to NFS and shared)
nfs.server = xx.yy.zz.ww ( IP of my NFS server configured above)
And that's it the chart creates an nfs-client storage class which you can use to create a PVC and attach to your pods.
Note - Make sure to configure the /etc/exports file on the NFS server to look like this as mentioned in the reference digital ocean document.
/srv kubernetes_node_1_IP(rw,sync,no_root_squash,no_subtree_check)
/srv kubernetes_node_2_IP(rw,sync,no_root_squash,no_subtree_check)
... and so on.
I am using the PVC for some php and laravel applications, seem to work well without any considerable delays. Although you will have to check for your specific requirements. HTH.

Related

Dynamic storages in Kubernetes

I have an application running on Kubernetes that needs to access SMB shares that are configured dynamically (host, credentials, etc) within said application. I am struggling to achieve this (cleanly) with Kubernetes.
I am facing several difficulties:
I do not want "a" storage, I want explicitly specified SMB shares
These shares are dynamically defined within the application and not known beforehand
I have a variable amount of shares and a single pod needs to be able to access all of them
We currently have a solution where, on each kubernetes worker node, all shares are mounted to mountpoints in a common folder. This folder is then given as HostPath volume to the containers that need access to those storages. Finally, each of those containers has a logic to access the subfolder(s) matching the storage(s) he needs.
The downside, and the reason why I'm looking for a cleaner alternative, is:
HostPath volumes present security risks
For this solution, I need something outside Kubernetes that mounts the SMB shares automatically on each Kubernetes node
Is there a better solution that I am missing?
The Kubernetes object that seems to match this approach the most closely is the Projected Volume, since it "maps existing volume sources into the same directory". However, it doesn't support the type of volume source I need and I don't think it is possible to add/remove volume sources dynamically without restarting the pods that use this Projected Volume.

For sure your current solution using HostPath on the nodes is not flexible, not secure thus it is not a good practice.
I think you should consider using one of the custom drivers for your SMB shares:
CIFS FlexVolume Plugin - older solution, not maintained
SMB CSI Driver - actively developed (recommended)
CIFS FlexVolume Plugin:
This solution is older and it is replaced by a CSI Driver. The advantage compared to CSI is that you can specify SMB shares directly from the pod definition (including credentials as Kubernetes secret) as you prefer.
Here you can find instructions on how to install this plugin on your cluster.
SMB CSI Driver:
This driver will automatically take care of mounting SMB shares on all nodes by using DaemonSet.
You can install SMB CSI Driver either by bash script or by using a helm chart.
Assuming you have your SMB server ready, you can use one of the following solution to access it from your pod:
Storage class
PV/PVC
In both cases you have to use a previously created secret with the credentials.
In your case, for every SMB share you should create a Storage class / PV and mount it to the pod.
The advantage of CSI Driver is that it is newer, currently maintained solution and it replaced FlexVolume.
Below is diagram representing how CSI plugin operates:
Also check:
Kubernetes volume plugins evolution from FlexVolume to CSI
Introducing Container Storage Interface (CSI) Alpha for Kubernetes

How to have data in a database with FastAPI persist across multiple nodes?

If I use the https://github.com/tiangolo/full-stack-fastapi-postgresql project generator, how would one be able to persist data across multiple nodes (either with docker swarm or kubernetes)?
As I understand it, any postgresql data in a volumes directory would be different for every node (e.g. every digitalocean droplet). In this case, a user may ask for their data, get directed by traefik to a node with a different volumes directory, and return different information to the case where they may have been directed to another node. Is this correct?
If so, what would be the best approach to have multiple servers running a database work together and have the same data in the database?

On kubernetes, persistent volumes are used to associate storage that is mounted onto pods wherever they are loaded in the cluster and they are managed by providing the cluster with storage classes that map to drivers that map to some kind of SAN storage.
Docker / Docker swarm has similar support for docker volume plugins, but with the ascendancy of K8s there are virtually no active open source projects, and most of the prior commercial SAN driver vendors have migrated to K8s instead.
Nonetheless, depending on your tolerance, you can use a mix of direct nfs / fuse mounts, there are some not entirely abandoned docker volume drivers available in the nfs / glusterfs space.
This issue moby/moby #39624 addresses CSI support that we will hopefully see drop in 2021 that will bring swarm back inline with k8s.

Mapping local directory to kubernetes

I am using Docker desktop to run a application in kubernetes platform where i need location to store files how can i use my local directory(c:\app-data) to be pointed to application running in kubernetes.

I had a similar problem. Docker contains are usually meant to be throwaway/gateway containers normally, so people don't usually use them for storing files.
That being said, you have two options:
Add path and files to docker container, which will cause your docker container to be massive in size (NOT RECOMMENDED). Docker build will require substantial time and memory, as all the files will be copied. Here's an example of creating a local ubuntu container with docker. https://thenewstack.io/docker-basics-how-to-share-data-between-a-docker-container-and-host/
Host your files through another server/api, and fetch those files using simple requests in your app. I used this solution. The only caveat is you need
to be able to host your files somehow. This is easy enough, but may require extra payment. https://www.techradar.com/best/file-hosting-and-sharing-services

You can't really do this. The right approach depends on what the data you're trying to store is.
If you're just trying to store data somewhere – perhaps it's the backing data for a MySQL StatefulSet – you can create a PersistentVolumeClaim like normal. Minikube includes a minimal volume provisioner so you should automatically get a PersistentVolume created; you don't need to do any special setup for this. But, the PersistentVolume will live within the minikube container/VM; if you completely delete the minikube setup, it could delete that data, and you won't be able to directly access the data from the host.
If you have a data set on the host that your container needs to access, there are a couple of ways to do it. Keep in mind that, in a "real" Kubernetes cluster, you won't be able to access your local filesystem at all. Creating a PersistentVolume as above and then running a pod to copy the data into it could be one approach; as #ParmandeepChaddha suggests in their answer, baking the data into the image is another reasonable approach (this can be very reasonable if the data is only a couple of megabytes).
If the data is the input or output data to your process, you can also consider restructuring your application so that it transfers that data over a protocol like HTTP. Set up a NodePort Service in front of your application, and use a tool like curl to HTTP POST the data into the service.
Finally, you could be considering a setup where all of the important data is local: you have some batch files on the local system, the job's purpose is to convert some local files to other local files, and it's just that the program is in minikube. (Or, similarly, you're trying to develop your application and the source files are on your local system.) In this case Kubernetes, as a distributed, clustered container system, isn't the right tool. Running the application directly on your system is the best approach; you can simulate this with a docker run -v bind mount, but this is inconvenient and can lead to permission and environment problems.
(In theory you can use a hostPath volume too, and minikube has some support to mount a host directory into the VM. In practice, the setup required to do this is as complex as the rest of your Kubernetes setup combined, and it won't be portable to any other Kubernetes installation. I wouldn't attempt this.)

You can mount your local directory to your kubernetes Pod using hostPath. Your path c:\app-data on your Windows host should be represented as either /C/app-data or /host_mnt/c/app-data, depending on your Docker Desktop version as suggested in this comment.
You may also want to take a look at this answer.

Share large block device among multiple Kubernetes pods via NFS keeping exports isolated per namespace

I host multiple projects on a Kubernetes cluster. Disk usage for media files is growing fast. My hosting provider allows me to create large block storage spaces, but these spaces can only be attached to a node (VPS) as a block device. For now I don’t consider switching to an object storage.
I want to use a cheap small VPS with a large block device attached to it as a NFS server for several projects (pods).
I've read some tutorials about using NFS as persistent volumes. The approaches are:
External NFS service. What about security? How to expose an export to one and only one pod inside the cluster?
ie, on the NFS server machine:
/share/
project1/
project2/
...
projectN/
Where each /share/project{i} must be only available to pods in project{i} namespace.
Multiple dockerized NFS services, using the affinity value to attach the nfs services to nfs server node.
I don't know if it's a good practice having many NFS server pods on the same node.
Maybe there are other approaches I'm not aware. What's the best Kubernetes approach for this use case?

There is no 1 answer for your questions.
It depends on your solution(architecture),requirements,security many others factors.
External NFS service. What about security?
In this case all consideration are on your side (my advice is to choose some supported solution by your cloud provided) please refer to Considerations when choosing a right solution.
As one example please read about security NFS Volume Security. In your case all responsibility are on administrator side to share volumes and provide appropriate security settings.
According to the second question.
You can use pv,pvc claim, namespaces and storage classes to achieve your goals.
Please refer to pv with nfs server and storage classes
Note:
For example, NFS doesn’t provide an internal provisioner, but an external provisioner can be used. Some external provisioners are listed under the repository kubernetes-incubator/external-storage. There are also cases when 3rd party storage vendors provide their own external provisioner .
For affinity rules please also refer to Allowed Topologies in case topology of provisioned volumes will be applied/restricted to specific zones.
Additional resources:
Kubernetes NFS-Client Provisioner
NFS Server Provisioner
Hope this help.

How to get files into pod?

I have a fully functioning Kubernetes cluster with one master and one worker, running on CoreOS.
Everything is working and my pods and services are running fine. Now I have no clue how to proceed in a webserver idea.
Before I go further: I have no configs at the moment about my idea I am going to explain. I just did a lot of research.
When setting up a pod (nginx) with a service. You get the default nginx page. After that you can setup a mount volume with a hostvolume (volume mapping from host to container).
But lets say I want to seperate every site (multiple sites separated with different pods), how can I let my users add files to their pod/nginx document root? Having FTP in the CoreOS node removes the Kubernetes way and adds security vulnerabilities.
If someone can help me shed some light on this issue, that would be great.
Thanks for your time.

I'm assuming that you want to have multiple nginx servers running. The content of each nginx server is managed by a different admin (you called them users).
TL;DR:
Option 1: Each admin needs to build their own nginx docker image every time the static files change and deploy that new image. This is if you consider these static files as a part of the source-code of the nginx application
Option 2: Use a persistent volume for nginx, the init-script for the nginx image should use something like s3 to sync all its files with s3 and then start nginx
Before you proceed with building an application with kubernetes. The most important thing is to separate your services into 2 conceptual categories, and give up your desire to touch the underlying nodes directly:
1) Stateless: These are services that are built by the developers and can be released. They can be stopped, started, moved from one node to another, their filesystem can be reset during restart and they will work perfectly fine. Majority of your web-services will fit this category.
2) Stateful: These services cannot be stopped and restarted willy nilly like the ones above. Primarily, their underlying filesystem must be persistent and remain the same across runs of the service. Databases, file-servers and similar services are in this category. These need special care and should use k8s persistent-volumes and now stateful-sets.
Typical application:
nginx: build the nginx.conf into the docker image, and deploy it as a stateless service
rails/nodejs/python service: build the source code into the docker image, configure with env-vars, deploy as a stateless service
database: mount a persistent volume, configure with env-vars, deploy as a stateful service.
Separate sites:
Typically, I think at a k8s deployment and a k8s service level. Each site can be one k8s deployment and k8s service set. You can then have separate ways to expose them (different external DNS/IPs)
Application users storing files:
This is firmly in the category of a stateful service. Use a persistent volume to mount to a /media kind of directory
Developers changing files:
Say developers or admins want to use FTP to change the files that nginx serves. The correct pattern is to build a docker image with the new files and then use that docker image. If there are too many files, and you don't consider those files to be a part of the 'source' of the nginx, then use something like s3 and a persistent volume. In your docker image init script, don't directly start nginx. Contact s3, sync all your files onto your persistent volume, then start nginx.

While the options and reasoning listed by iamnat are right, there's at least one more option to add to the list. You could consider using ConfigMap objects, maintain your file within the configmap and mount them to your containers.
A good example can be found in the official documentation - check the Real World Example configuring Redis section to get some actionable input.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse