Context :
We have a Apache Nifi cluster deployed in Kubernetes as Stateful sets, and a volume claim template is used for Nifi repositories.
Nifi helm charts we are using
There is a use case where file processing is done by Nifi. So the file feeds are put into a shared folder and nifi would read it from the shared folder. When multiple nodes of Nifi is present all three would read from the shared folder.
In a non kubernetes environment we use NFS file share.
In AWS we use AWS S3 for storage and Nifi has processors to read from S3.
Problem :
Nifi is already deployed as a statefulset and use volume claim template for the storage repository. How can we mount this NFS share for file feed to all nifi replicas.
or in other word putting the question in a generic manner,
How can we mount a single NFS shared folder to all statefulset replicas ?
Solutions tried
We tried to link separate pvc claimed folders to the nfs share , but looks like a work around.
Can somebody please help. Any hints would be highly appreciated.
Put it in the pod template like normal. NFS is a "ReadWriteMany" volume type so you can create one PVC and then use it on every pod simultaneously. You can also configure NFS volumes directly in the pod data but using a PVC is probably better.
It sounds like what you have is correct :)
Related
I need some kind of advise on my problem as I cannot find a suitable solution.
I have a k8s cluster with several nodes and Directus deployed in several pods. My developers want to extend Directus with custom extensions. This is done by uploading these source files in the /extension folder.
This means every pod needs to share the /extension folder to access the files. So my thought was using a shared pvc.
Basically I can setup a NFS pvc wirh rwx to be shared between pods and mounted as /extension.
BUT: How can I deploy the source code and folder structure on this pvc? So I would need to either accesss the FS externally via local mount OR via Github actions to deploy code changes. Jut NFS does not support any auth method so I would open the gate of hell if I access the NFS port outside the privat network.
Is there any other pvc rwx storage solution that could be used also with at least local access options?
I would create the pvc, access it via kubectl, buimd the folder structure as needed from Directus, push code via kubectl cp. Jut this seems a mess in production.
In the meantime I proceed with the following stack:
NFS pod mounts block storage PV RWO and provides NFS PVC to cluster
Directus mounts NFS PVC at /directus/extensions
Filebrowser mounts NFS PVC at /srv
So basically I used filebrowser:filebrowser (Github) container to serve the NFS pvc content (=directus extensions folder) to developers over HTTPS interface. This enables them to upload new files manually directly on the NFS mount that would be picked up by the App.
This seems propriate for development phase but I doubt this could work in production phase. Reasons for this:
There is no integration in CI/CD possible
Restart of filebrowser container requires manual interaction to secure the pod as they don't provide .env config for proper k8s deployment
So I am still looking at solutions to push code changes onto a Kubernetes NFS mount. Any dockerized service in mind?!
I have helm + kubernetes setup. I need to store large file ~30-80 MB in cluster and mount it to pods. How do I achieve this, so that I don't manually upload the file to every environment?
You can share common files using NFS. There are many ways to use NFS with K8s such as this one. If your cluster is managed by cloud provider such as AWS, you can consider EFS which is NFS compatible. NFS compatible solution on cloud platform is very common today. This way you never need to manually upload files to worker nodes. Your helm chart will focus on create the necessary PersistentVolumeClaim/PersistentVolume and volume mount to access the shared files.
One way to do this is to use a helm install+upgrade hook AND an init container.
Set a helm install hook to create a kubernetes job that will download the file to the mounted volume.
The init container on the pod will wait indefinitely until the download is complete.
I am new at Kubernetes and am trying to understand the most efficient and secure way to handle sensitive persisted data that interacts with a k8 pod. I have the following requirements when I start a pod in a k8s cluster:
The pod should have persisted storage.
Data inside the pod should be persistent even if the pod crashes or restarts.
I should be able to easily add or remove data from hostPath into the pod. (Not sure if this is feasible since I do not know how the data will behave if the pod starts on a new node in a multi node environment. Do all nodes have access to the data on the same hostPath?)
Currently, I have been using StatefulSets with a persistent volume claim on GKE. The image that I am using has a couple of constraints as follows:
I have to mount a configuration file into the pod before it starts. (I am currently using configmaps to pass the configuration file)
The pod that comes up, creates its own TLS certificates which I need to pass to other pods. (Currently I do not have a process in place to do this and thus have been manually copy pasting these certificates into other pods)
So, how do I maintain a common persisted storage that handles sensitive data between multiple pods and how do I add pre-configured data to this storage? Any guidance or suggestions are appreciated.
I believe this documentation on creating a persistent disk with multiple readers [1] is what you are looking for. you will however only be able to have the pods read from the disk since GCP does not support "WRITEMANY".
Regarding hostpaths, the mount point is on the pod the volume is a directory on the node. I believe the hostpath is confined to individual nodes.
[1] https://cloud.google.com/kubernetes-engine/docs/how-to/persistent-volumes/readonlymany-disks
[2] https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
When we deploy apache kafka on Linux/Windows, we have log.dirs and broker.id properties. on bare metal, the files are saved on the individual host instances. However, when deployed via K8s on public cloud - there must be some form of volume mounting to make sure that the transaction log fils are saved somewhere?
Has anyone done this on K8s? I am not referring to Confluent (because it's a paid subscription).
As far as I understand you are just asking how to deal with storage in Kubernetes.
Here is a great clip that talks about Kubernetes Storage that I would recommend to You.
In Kubernetes you are using Volumes
On-disk files in a Container are ephemeral, which presents some problems for non-trivial applications when running in Containers. First, when a Container crashes, kubelet will restart it, but the files will be lost - the Container starts with a clean state. Second, when running Containers together in a Pod it is often necessary to share files between those Containers. The Kubernetes Volume abstraction solves both of these problems.
There is many types of Volumes, some are cloud specific like awsElasticBlockStore, gcePersistentDisk, azureDisk and azureFile.
There are also other types like glusterfs, iscsi, nfs and many more that are listed here.
You can also use Persistent Volumes which provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
Here is a link to Portworx Kafka Kubernetes in production: How to Run HA Kafka on Amazon EKS, GKE and AKS which might be handy for you as well.
And if you would be interested in performance then Kubernetes Storage Performance Comparison is a great 10min read.
I hope those materials will help you understand Kubernetes storage.
I need a piece of advice / recommendation / link to tutorial.
I am designing a kubernetes cluster and one of the projects is a Wordpress site with lots of pictures (photo blog).
I want to be able to tear down and re-create my cluster within an hour, so all "persistent" pieces need to be hosted outside of cluster (say, separate linux instance).
It is doable with database - I will just have a MySQL server running on that machine and will update WP configs accordingly.
It is not trivial with filesystem storage. I am looking at Kubernetes volume providers, specifically NFS. I want to setup NFS server on a separate machine and have each WP pod use that NFS share through volume mechanism. In that case, I can rebuild my cluster any time and data will be preserved. Almost like database access, but filesystem.
The questions are the following. Does this solution seem feasible? Is there any better way to achieve same goal? Does Kubernetes NFS plugin support the functionality I need? What about authorization?
so I am using a very similar strategy for my cluster where all my PVC are placed on a standalone VM instance with a static IP and which has an NFS-server running and a simple nfs-client-provisioner helm chart on my cluster.
So here what I did :
Created a server(Ubuntu) and installed the NFS server on it. Reference here
Install a helm chart/app for nfs-client-proviosner with parameters.
nfs.path = /srv ( the path on server which is allocated to NFS and shared)
nfs.server = xx.yy.zz.ww ( IP of my NFS server configured above)
And that's it the chart creates an nfs-client storage class which you can use to create a PVC and attach to your pods.
Note - Make sure to configure the /etc/exports file on the NFS server to look like this as mentioned in the reference digital ocean document.
/srv kubernetes_node_1_IP(rw,sync,no_root_squash,no_subtree_check)
/srv kubernetes_node_2_IP(rw,sync,no_root_squash,no_subtree_check)
... and so on.
I am using the PVC for some php and laravel applications, seem to work well without any considerable delays. Although you will have to check for your specific requirements. HTH.