I have an application which accesses a couple of files from a directory. I went through kubernetes volumes and persistent volumes and volume claims. this is a multi-node kubernetes cluster. do we have any direct solution which can be used which does not need any external storage like a nfs server etc?
I have a VM server from where i execute my kubernetes commands. I am new to kubernetes so please help me with this.
Also i was looking at local persistent volume. is there a link which i can go through for an example. I am looking at local https://kubernetes.io/docs/concepts/storage/volumes/#local
But the example does not explain what below are under nodeAffinity:
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv
spec:
capacity:
storage: 100Gi
# volumeMode field requires BlockVolume Alpha feature gate to be enabled.
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /mnt/disks/ssd1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- example-node
This depends on your use case, if the files you want to share across the cluster are more than a few megabytes in size, you'll need some kind of storage operator. Local storage is probably not what you're looking for.
For small files (configs, keys, init scripts)
If the files are small, such as configuration files or ssh keys or similar you can use a kubernetes configmap (or secret). This will allow you to setup a few files or directories with a few files. Checkout the documentation
For large files (shared data, graphics, binaries)
If however you want to share a few hundred megabytes or gigabytes of files, you need a storage provider with your cluster.
If you are using a cloud provider, such as Google, AWS or Azure, this should be straightforward, you need to create a persistent disk with your cloud provider and copy your required data onto the disk. Once that's done, simply follow the guide for the relevant cloud providers:
Google Cloud - GCE Persistent Disk
AWS - Elastic Block Storage
Azure - Azure Disk
(#justcompile pointed out that AWS doesn't support multiple read-only mounts to instances, I was unable to find similar information for Azure)
If however, you're running your own kubernetes cluster on "baremetal", you'll have to setup either an NFS server, a Ceph cluster and probably use something like rook on top.
Related
I am using Azure Kubernetes and I have created Persistent Volume, Claims and Storage Class.
I want to deploy pods on the Persistent Volume so we can increase the Volume anytime as per the requirement. Right now our Pods are deployed in the Virtual Machines OS Disk. Since we are using the default Pods deployment on the VM Disk when we run out of the disk space the whole cluster will be destroyed and created again.
Please let me know how can I configure Pods to deploy in Azure (Managed) Disk.
Thanks,
Mrugesh
You don't have to create a Persistent Volume manually, if you want Azure Disk, this can be created dynamically for you.
From Azure Built-in storage classes:
The default storage class provisions a standard SSD Azure disk.
Standard storage is backed by Standard SSDs and delivers cost-effective storage while still delivering reliable performance.
You only have to create the PersistentVolumeClaim with the storage class you want to use, e.g.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: azure-managed-disk
spec:
accessModes:
- ReadWriteOnce
storageClassName: default
resources:
requests:
storage: 5Gi
and then refer to that PVC in your Deployment or Pods.
I have got a deployment.yaml and it uses a persistentvolumeclaim like so
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: mautic-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 5Gi
storageClassName: standard
I am trying to scale my deployment horizontally using (Horizontal Pod Scheduler) but when I scale my deployment, the rest of the pods are in ContainerCreating process and this is the error I get when I describe the pod
Unable to attach or mount volumes: unmounted volume
What am I doing wrong here?
Using Deployment is great if your app can scale horizontally. However, using a Persistent Volume with a PersistentVolumeClaim can be challenging when scaling horizontally.
Persistent Volume Claim - Access Modes
A PersistentVolumeClaim can be requested for a few different Access Modes:
ReadWriteOnce (most common)
ReadOnlyMany
ReadWriteMany
Where ReadWriteOnce is the most commonly available and is typical behavior for a local disk. But to scale your app horizontally - you need a volume that is available from multiple nodes at the same time, so only ReadOnlyMany and ReadWriteMany is viable options. You need to check what what access modes are available for your storage system.
In addition, you use a regional cluster from a cloud provider, it spans over three Availability Zones and a volume typically only live in one Availability Zone, so even if you use ReadOnlyMany or ReadWriteMany access modes, it makes your volume available on multiple nodes in the same AZ, but not available in all three AZs in your cluster. You might consider using a storage class from your cloud provider that is replicated to multiple Availability Zones, but it typically costs more and is slower.
Alternatives
Since only ReadWriteOnce is commonly available, you might look for better alternatives for your app.
Object Storage
Object Storage, or Buckets, is a common way to handle file storage in the cloud instead of using filesystem volumes. With Object Storage you access you files via an API over HTTP. See e.g. AWS S3 or Google Cloud Storage.
StatefulSet
You could also consider StatefulSet where each instance of your app get its own volume. This makes your app distributed but typically not horizontally scalable. Here, your app typically needs to implement replication of data, typically using Raft and is a more advanced alterantive.
Cant share a PVC with multiple pods in the GCP (with the GCP-CLI)
When I apply the config with ReadWriteOnce works at once
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: <name>
namespace: <namespace>
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 50Gi
But with ReadWriteMany the status hangs on pending
Any ideas?
So it is normal that when you apply the config with ReadWriteOnce works at once, that's the rule.
ReadWriteOnce is the most common use case for Persistent Disks and works as the default access mode for most applications.
GCE persistent disk do not support ReadWriteMany !
Instead of ReadWriteMany, you can just use ReadOnlyMany.
More information you can find here: persistentdisk. But as you now result will not be the same as you want.
If you want to share volumes you could try some workarounds:
You can create services.
Your service should look after the data that is related to its area of concern and should allow access to this data to other services via an interface. Multi-service access to data is an anti-pattern akin to global variables in OOP.
If you where looking to write logs, you should have a log service which each service can call with the relevant data it needs to log. Writing directly to a shared disk means that you'd need to update every container if you change your log directory structure or add extra features.
Also try to use high-performance, fully managed file storage for applications that require a file system interface and a shared file system.
More information you can find here: access-fileshare.
Referring to the Kubernetes-Documentation, GCE does not support ReadWriteMany-Storage: https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
There are some options here:
How to share storage between Kubernetes pods?
https://cloud.google.com/filestore/docs/accessing-fileshares
I have a multi-node (2) Kubernetes cluster running on bare metal. I understand that 1. hostPath is bad for production and 2. hostPath Persistent Volumes are not supported for multi-node setups. Is there a way that I can safely run apps that are backed by a SQLite database? Over NFS the database locks a lot and really hurts the performance of the apps.
I would probably place the SQLite databases for each app on the hostPath volume and everything would run smoothly again. But I was wondering if there are some workarounds to achieve this, even if I have to restrict apps to a specific node.
It seems you should use Local Persistent Volumes GA.
As per documentation:
A local volume represents a mounted local storage device such as a disk, partition or directory.
Compared to hostPath volumes, local volumes can be used in a durable and portable manner without manually scheduling Pods to nodes, as the system is aware of the volume’s node constraints by looking at the node affinity on the PersistentVolume.
However:
At GA, Local Persistent Volumes do not support dynamic volume provisioning.
More information you can find here, and here.
As one example:
apiVersion: v1
kind: PersistentVolume
metadata:
name: example-pv
spec:
capacity:
storage: 100Gi
# volumeMode field requires BlockVolume Alpha feature gate to be enabled.
volumeMode: Filesystem
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Delete
storageClassName: local-storage
local:
path: /mnt/disks/ssd1
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- example-node
With Local Persistent Volumes, the Kubernetes scheduler ensures that a pod using a Local Persistent Volume is always scheduled to the same node
I want to setup a PVC on AWS, where I need ReadWriteMany as access mode. Unfortunately, EBS only supports ReadWriteOnce.
How could I solve this?
I have seen that there is a beta provider for AWS EFS which supports ReadWriteMany, but as said, this is still beta, and its installation looks somewhat flaky.
I could use node affinity to force all pods that rely on the EBS volume to a single node, and stay with ReadWriteOnce, but this limits scalability.
Are there any other ways of how to solve this? Basically, what I need is a way to store data in a persistent way to share it across pods that are independent of each other.
Using EFS without automatic provisioning
The EFS provisioner may be beta, but EFS itself is not. Since EFS volumes can be mounted via NFS, you can simply create a PersistentVolume with a NFS volume source manually -- assuming that automatic provisioning is not a hard requirement on your side:
apiVersion: v1
kind: PersistentVolume
metadata:
name: my-efs-volume
spec:
capacity:
storage: 100Gi # Doesn't really matter, as EFS does not enforce it anyway
volumeMode: Filesystem
accessModes:
- ReadWriteMany
mountOptions:
- hard
- nfsvers=4.1
- rsize=1048576
- wsize=1048576
- timeo=600
- retrans=2
nfs:
path: /
server: fs-XXXXXXXX.efs.eu-central-1.amazonaws.com
You can then claim this volume using a PersistentVolumeClaim and use it in a Pod (or multiple Pods) as usual.
Alternative solutions
If automatic provisioning is a hard requirement for you, there are alternative solutions you might look at: There are several distributed filesystems that you can roll out on yourcluster that offer ReadWriteMany storage on top of Kubernetes and/or AWS. For example, you might take a look at Rook (which is basically a Kubernetes operator for Ceph). It's also officially still in a pre-release phase, but I've already worked with it a bit and it runs reasonably well.
There's also the GlusterFS operator, which already seems to have a few stable releases.
You can use Amazon EFS to create PersistentVolume with ReadWriteMany access mode.
Amazon EKS Announced support for the Amazon EFS CSI Driver on Sep 19 2019, which makes it simple to configure elastic file storage for both EKS and self-managed Kubernetes clusters running on AWS using standard Kubernetes interfaces.
Applications running in Kubernetes can
use EFS file systems to share data between pods in a scale-out group,
or with other applications running within or outside of Kubernetes.
EFS can also help Kubernetes applications be highly available because
all data written to EFS is written to multiple AWS Availability zones.
If a Kubernetes pod is terminated and relaunched, the CSI driver will
reconnect the EFS file system, even if the pod is relaunched in a
different AWS Availability Zone.
You can deploy the Amazon EFS CSI Driver to an Amazon EKS cluster following the EKS-EFS-CSI user guide, basically like this:
Step 1: Deploy the Amazon EFS CSI Driver
kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=master"
Note: This command requires version 1.14 or greater of kubectl.
Step 2: Create an Amazon EFS file system for your Amazon EKS cluster
Step 2.1: Create a security group that allows inbound NFS traffic for your Amazon EFS mount points.
Step 2.2: Add a rule to your security group to allow inbound NFS traffic from your VPC CIDR range.
Step 2.3: Create the Amazon EFS file system configured with the security group you just created.
Now you are good to use EFS with ReadWriteMany access mode in your EKS Kubernetes project with the following sample manifest files:
1. efs-storage-class.yaml: Create the storage class
kind: StorageClass
apiVersion: storage.k8s.io/v1
metadata:
name: efs-sc
provisioner: efs.csi.aws.com
kubectl apply -f efs-storage-class.yaml
2. efs-pv.yaml: Create PersistentVolume
apiVersion: v1
kind: PersistentVolume
metadata:
name: ftp-efs-pv
spec:
storageClassName: efs-sc
persistentVolumeReclaimPolicy: Retain
capacity:
storage: 10Gi # Doesn't really matter, as EFS does not enforce it anyway
volumeMode: Filesystem
accessModes:
- ReadWriteMany
csi:
driver: efs.csi.aws.com
volumeHandle: fs-642da695
Note: you need to replace the volumeHandle value with your Amazon EFS file system ID.
3. efs-pvc.yaml: Create PersistentVolumeClaim
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: ftp-pv-claim
labels:
app: ftp-storage-claim
spec:
accessModes:
- ReadWriteMany
resources:
requests:
storage: 10Gi
storageClassName: efs-sc
That should be it. You need to refer to the aforementioned official user guide for detailed explanation, where your can also find an example app to verify your setup.
As you mention EBS volume with affinity & node selector will stop scalability however with EBS only ReadWriteOnce will work.
Sharing my experience, if you are doing many operations on the file system and frequently pushing & fetching files it might could be slow with EFS which can degrade application performance. operation rate on EFS is slow.
However, you can use GlusterFs in back it will be provisioning EBS volume. GlusterFS also support ReadWriteMany and it will be faster compared to EFS as it's block storage (SSD).