I have an application running on Kubernetes that needs to access SMB shares that are configured dynamically (host, credentials, etc) within said application. I am struggling to achieve this (cleanly) with Kubernetes.
I am facing several difficulties:
I do not want "a" storage, I want explicitly specified SMB shares
These shares are dynamically defined within the application and not known beforehand
I have a variable amount of shares and a single pod needs to be able to access all of them
We currently have a solution where, on each kubernetes worker node, all shares are mounted to mountpoints in a common folder. This folder is then given as HostPath volume to the containers that need access to those storages. Finally, each of those containers has a logic to access the subfolder(s) matching the storage(s) he needs.
The downside, and the reason why I'm looking for a cleaner alternative, is:
HostPath volumes present security risks
For this solution, I need something outside Kubernetes that mounts the SMB shares automatically on each Kubernetes node
Is there a better solution that I am missing?
The Kubernetes object that seems to match this approach the most closely is the Projected Volume, since it "maps existing volume sources into the same directory". However, it doesn't support the type of volume source I need and I don't think it is possible to add/remove volume sources dynamically without restarting the pods that use this Projected Volume.
For sure your current solution using HostPath on the nodes is not flexible, not secure thus it is not a good practice.
I think you should consider using one of the custom drivers for your SMB shares:
CIFS FlexVolume Plugin - older solution, not maintained
SMB CSI Driver - actively developed (recommended)
CIFS FlexVolume Plugin:
This solution is older and it is replaced by a CSI Driver. The advantage compared to CSI is that you can specify SMB shares directly from the pod definition (including credentials as Kubernetes secret) as you prefer.
Here you can find instructions on how to install this plugin on your cluster.
SMB CSI Driver:
This driver will automatically take care of mounting SMB shares on all nodes by using DaemonSet.
You can install SMB CSI Driver either by bash script or by using a helm chart.
Assuming you have your SMB server ready, you can use one of the following solution to access it from your pod:
Storage class
PV/PVC
In both cases you have to use a previously created secret with the credentials.
In your case, for every SMB share you should create a Storage class / PV and mount it to the pod.
The advantage of CSI Driver is that it is newer, currently maintained solution and it replaced FlexVolume.
Below is diagram representing how CSI plugin operates:
Also check:
Kubernetes volume plugins evolution from FlexVolume to CSI
Introducing Container Storage Interface (CSI) Alpha for Kubernetes
Related
I'm using GKE to run K8 workloads and want to add TPU support. From GCP docs, I "need" to attach a GCS bucket so the Job can read models and store logs. However, we already create shared NSF mounts for our k8 clusters. How hard of a requirement is it to "need" GCS to use TPUs? Can shared Filestore NFS mounts work just fine? What about using GCS Fuse?
I'm trying to avoid having the cluster user know about the back end file system (NFS vs GCS), and just know that that the files they provide will be available at "/home/job". Since the linked docs show passing a gs://mybucket/some/path value as needed for file system parameters, I'm not sure if a /home/job value will still work. Does the TPU access the filesystem directly and is only compatible with GCS? Or do the nodes access the filesystem (preferring GCS) and then share the data (in memory) with the TPUs?
I'll try it out to learn the hard way (and report back), but curious if others have experience with this already.
I am trying to monitor filesystem usage for pods in k8s. I am using Kubernetes (microk8s) and hostpath persistent volumes. I am running Kafka along with a number of producers to see what happens when I go past the PVC size limit among other things. I have tried getting information from the API server but it is not reported there. Since it is only using hostpath, that kind of makes sense. It is not a dynamic volume system. Doing df on the host just shows all of the volumes with the same utilization as the root filesystem. This is the same result using exec -- df within the container. There are no pvcRefs on the containers using api server, which kind of explains why the dashboard doesn't have this information. Is this a dead end or does someone have a way around this limitation? I am now wondering if the PVC limits will be enforced.
Since with hostPath your data is stored directly on the worker you won't be able to monitor the usage. Using hostPath has many drawbacks and while its good for testing it should not be used for some prod system. Keeping the data directly on the node is dangerous and in the case of node failure/replacement you will loose it. Other disadvantages are:
Pods created from the same pod template may behave differently on different nodes because of different hostPath file/dir contents on those nodes
Files or directories created with HostPath on the host are only writable by root. Which means, you either need to run your container process as root or modify the file permissions on the host to be writable by non-root user, which may lead to security issues
hostPath volumes should not be used with Statefulsets.
As you already found out it would be good idea to move on from hostPath towards something else.
I developed a web application for our students and i would like to run this now in a kubernetes container environment. Every user (could be seen as tenant) gets its own application environment (1:1 relation).
the application environment consists of 2 pods (1x webserver, 1x database), defined by a deployment and a service.
I am using kubernetes v1.17.2 and i would like to use the feature of dynamic PersistentVolumeClaims together with the possibility to keep data of a specific user (tenant) between the deletion and re-creation of a new pod (e.g. case of updating to a new application version or after a hardware reboot).
I thought about using a environment variable at pod-creation (e.g. user-1, user-2, user-x,...) and using this information to allow a reusing of a dynamic created PersistentVolume.
is there any best-practise or concept how this can be achieved?
best regards
shane
The outcome that you wish to achieve will be strongly connected to the solution that you are currently using.
It will differ between Kubernetes instances that are provisioned in cloud (for example GKE) and Kubernetes instances on premises (for example: kubeadm, kubespray).
Talking about the possibility to retain user data please refer to official documentation: Kubernetes.io: Persistent volumes reclaiming. It shows a way to retain data inside a pvc.
Be aware of that local static provisioner does not support dynamic provisioning.
The local volume static provisioner manages the PersistentVolume lifecycle for pre-allocated disks by detecting and creating PVs for each local disk on the host, and cleaning up the disks when released. It does not support dynamic provisioning.
Github.com: Storage local static provisioner
Contrary to that VMware Vsphere supports dynamic provisioning. If you are using this solution please refer to this documentation
In your question there is a lack of specific explanation of users in your environment. Are they inside your application or are they outside? Is the application authenticating users? One of solution will be to create users inside of Kubernetes by service accounts and limit their view to namespace specifically created for them.
For service account creation please refer to: Kubernetes.io: Configure service account.
Additionally you could also look on Statefulsets.
I need a piece of advice / recommendation / link to tutorial.
I am designing a kubernetes cluster and one of the projects is a Wordpress site with lots of pictures (photo blog).
I want to be able to tear down and re-create my cluster within an hour, so all "persistent" pieces need to be hosted outside of cluster (say, separate linux instance).
It is doable with database - I will just have a MySQL server running on that machine and will update WP configs accordingly.
It is not trivial with filesystem storage. I am looking at Kubernetes volume providers, specifically NFS. I want to setup NFS server on a separate machine and have each WP pod use that NFS share through volume mechanism. In that case, I can rebuild my cluster any time and data will be preserved. Almost like database access, but filesystem.
The questions are the following. Does this solution seem feasible? Is there any better way to achieve same goal? Does Kubernetes NFS plugin support the functionality I need? What about authorization?
so I am using a very similar strategy for my cluster where all my PVC are placed on a standalone VM instance with a static IP and which has an NFS-server running and a simple nfs-client-provisioner helm chart on my cluster.
So here what I did :
Created a server(Ubuntu) and installed the NFS server on it. Reference here
Install a helm chart/app for nfs-client-proviosner with parameters.
nfs.path = /srv ( the path on server which is allocated to NFS and shared)
nfs.server = xx.yy.zz.ww ( IP of my NFS server configured above)
And that's it the chart creates an nfs-client storage class which you can use to create a PVC and attach to your pods.
Note - Make sure to configure the /etc/exports file on the NFS server to look like this as mentioned in the reference digital ocean document.
/srv kubernetes_node_1_IP(rw,sync,no_root_squash,no_subtree_check)
/srv kubernetes_node_2_IP(rw,sync,no_root_squash,no_subtree_check)
... and so on.
I am using the PVC for some php and laravel applications, seem to work well without any considerable delays. Although you will have to check for your specific requirements. HTH.
With Kubernetes on can define storage classes with provisioners. How does one find which provisioners are installed and available in the cluster?
Inspecting the storage classes will reveal which provisioners are already in use, but not whether there are more available.
A provisioner does not necessarily need to run in the cluster, e.g. the provisioner for an external storage appliance just connects to the cluster api server and watches for new persistent volume requests created with a storage class bound to its provisioner name. This is why as of Kubernetes 1.7 there is no intended universal way to see if a storage classes provisioner is actually available or not.