DigitalOcean NFS vs do-block storage - kubernetes

I'm new to DigitalOcean and K8S and can't seem to wrap my head around this:
If I need to run multiple replica of Nginx containers, should I use block storage or NFS storage? I want static html data share by all the NGINX containers running in separate pods.
From my understanding, if I want to share data across multiple pods, I should be using NFS.
Taken from https://www.digitalocean.com/community/tutorials/how-to-set-up-readwritemany-rwx-persistent-volumes-with-nfs-on-digitalocean-kubernetes
The digitalocean-csi integrates a Kubernetes cluster with the DigitalOcean Block Storage product. A developer can use this to dynamically provision Block Storage volumes for containerized applications in Kubernetes. However, applications can sometimes require data to be persisted and shared across multiple Droplets. DigitalOcean’s default Block Storage CSI solution is unable to support mounting one block storage volume to many Droplets simultaneously. This means that this is a ReadWriteOnce (RWO) solution, since the volume is confined to one node. The Network File System (NFS) protocol, on the other hand, does support exporting the same share to many consumers. This is called ReadWriteMany (RWX), because many nodes can mount the volume as read-write. We can therefore use an NFS server within our cluster to provide storage that can leverage the reliable backing of DigitalOcean Block Storage with the flexibility of NFS shares.
Any clarification would be appreciated.

Related

minio for mariadb in kubernetes

I'm running a k3s single node cluster and have the k3s local-path-provisioner as storage. As I want to be able to add nodes in the future, I looked at minio to use on top of the local-path as storage. But I'm not sure if it's the right choice, cause I my workloads primarily use mariadb for data and I read, that an s3 compatible bucket isn't the best for database applications.
I hope you can help me figure this out.
If you don't want to use object storage then here are your options for running a local storage provisioner:
GlusterFS StorageClass
Doesn't have lot of documentation on how to set it up. But if you know your way around GlusterFS It'll be a good option.
local-path-provisioner
I
t provides a way for the Kubernetes users to utilize the local storage in each node
OpenEBS -> has a local volume storage engine but I think this is not designed to work on a shared volume mount and it end up tying a pod to a specific node since the data "doesn't exist" on the other nodes.
longhorn [recommened]
It creates a dedicated storage controller for each block device volume and synchronously replicates the volume across multiple replicas stored on multiple nodes.
rook
Rook is a storage operators for Kubernetes, It supports multiple storage backends. Don't use the NFS one tho cause we hit a wall when using it with our DBs.

kubernetes persistent volume for bare metal accessible on all nodes and pods

I was trying the local or host path volumes on a LAN bare metal servers.
tried local but each node was having there own copy of the data.
How can i use volumes across all the nodes and pods.
Persistent Volumes have access semantics. Example on GCE if you are using a Persistent Disk, can either be mounted as writable to a single pod or to multiple pods as read-only. If you want multi writer semantics, you need to setup NFS or some other storage that let's you write from multiple pods. NFS can support multiple read/write clients.
In case you are interested in running NFS take a look: nfs-setup.
The NFS persistent volume and NFS claim gives an indirection that allow multiple pods to refer to the NFS server using a symbolic name rather than the hardcoded server address.
Take a look: pv-multiple-pods.
If you want to share data through your cluster, then you need to use network storage.
You can't expect kubernetes to just share your data accross all the nodes of your cluster. So local storage and host path won't work in that case.
As #MaggieO said, you can setup and use a NFS server.
If you just want to try it out, you can also use your favorite cloud provider storage solution (AWS S3, GCP Bucket, Azure Disk, etc). You can see the full list here

Share large block device among multiple Kubernetes pods via NFS keeping exports isolated per namespace

I host multiple projects on a Kubernetes cluster. Disk usage for media files is growing fast. My hosting provider allows me to create large block storage spaces, but these spaces can only be attached to a node (VPS) as a block device. For now I don’t consider switching to an object storage.
I want to use a cheap small VPS with a large block device attached to it as a NFS server for several projects (pods).
I've read some tutorials about using NFS as persistent volumes. The approaches are:
External NFS service. What about security? How to expose an export to one and only one pod inside the cluster?
ie, on the NFS server machine:
/share/
project1/
project2/
...
projectN/
Where each /share/project{i} must be only available to pods in project{i} namespace.
Multiple dockerized NFS services, using the affinity value to attach the nfs services to nfs server node.
I don't know if it's a good practice having many NFS server pods on the same node.
Maybe there are other approaches I'm not aware. What's the best Kubernetes approach for this use case?
There is no 1 answer for your questions.
It depends on your solution(architecture),requirements,security many others factors.
External NFS service. What about security?
In this case all consideration are on your side (my advice is to choose some supported solution by your cloud provided) please refer to Considerations when choosing a right solution.
As one example please read about security NFS Volume Security. In your case all responsibility are on administrator side to share volumes and provide appropriate security settings.
According to the second question.
You can use pv,pvc claim, namespaces and storage classes to achieve your goals.
Please refer to pv with nfs server and storage classes
Note:
For example, NFS doesn’t provide an internal provisioner, but an external provisioner can be used. Some external provisioners are listed under the repository kubernetes-incubator/external-storage. There are also cases when 3rd party storage vendors provide their own external provisioner .
For affinity rules please also refer to Allowed Topologies in case topology of provisioned volumes will be applied/restricted to specific zones.
Additional resources:
Kubernetes NFS-Client Provisioner
NFS Server Provisioner
Hope this help.

Apache Kafka - Volume Mapping for Message Log files in Kubernetes (K8s)

When we deploy apache kafka on Linux/Windows, we have log.dirs and broker.id properties. on bare metal, the files are saved on the individual host instances. However, when deployed via K8s on public cloud - there must be some form of volume mounting to make sure that the transaction log fils are saved somewhere?
Has anyone done this on K8s? I am not referring to Confluent (because it's a paid subscription).
As far as I understand you are just asking how to deal with storage in Kubernetes.
Here is a great clip that talks about Kubernetes Storage that I would recommend to You.
In Kubernetes you are using Volumes
On-disk files in a Container are ephemeral, which presents some problems for non-trivial applications when running in Containers. First, when a Container crashes, kubelet will restart it, but the files will be lost - the Container starts with a clean state. Second, when running Containers together in a Pod it is often necessary to share files between those Containers. The Kubernetes Volume abstraction solves both of these problems.
There is many types of Volumes, some are cloud specific like awsElasticBlockStore, gcePersistentDisk, azureDisk and azureFile.
There are also other types like glusterfs, iscsi, nfs and many more that are listed here.
You can also use Persistent Volumes which provides an API for users and administrators that abstracts details of how storage is provided from how it is consumed:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
Here is a link to Portworx Kafka Kubernetes in production: How to Run HA Kafka on Amazon EKS, GKE and AKS which might be handy for you as well.
And if you would be interested in performance then Kubernetes Storage Performance Comparison is a great 10min read.
I hope those materials will help you understand Kubernetes storage.

GCE volume mounts as compared to Kubernetes volume mounts

Kubernetes has pretty extensive volume and volume mounting support (many different volume types, subpaths, mounting single files).
Can the same be achieved with GCE VMs?
Update:
I have some Kubernetes workflow that uses NFS and GCE PD volumes.
Suppose I want to run the same workflow without Kubernetes (by just starting GCE VMs).
What volume-related features will I lose/keep?
Some examples of features:
Having the same volume shared between multiple producer Pods/VMs.
Mounting single files into container/VM (as opposed to mounting directories only).
The PVs and GCE PD volumes used by GKE use Google Persistent Disks and thus are bound by the same limitations. This also means that there isn't much you can do on k8s that you can't do on GCE. The major difference is the resources won't be as fluid.
You can attach a disk to a GCE VM and mount it as a subpath if you want at the OS level or just mount the entire disk normally. You can also use a single disk in readOnlyMany mode which can be shared by multiple VMs in the same zone (same restriction you have in GKE). If you need scalability, you can use a Managed Instance Group that uses a snapshot of your disk so that replication won't skew the data.
You can also mount NFS in GCE as in GKE.
Migrating from GKE to GCE generally does not have too many restrictions. The major difference is that you are moving from a managed orchestration system to an unmanaged VM so you may need to do some more leg work to make sure that there is scalability (if need be) and resiliency.
Aside from the benefits that k8s offers all around, I can't think of any major benefits you lose concerning the volumes specifically.