Is it possible to create a volume that is shared between all pods in a deployment but impossible to mount for any other pod?
Alternatively that is read/write from one deployment and just read from any other pod?
That could be addressed in Kubernetes 1.12 (Q3 2018) with Topology aware dynamic provisioning, which is now in beta.
That means storage resources can now understand where they live.
This also includes beta support to AWS EBS and GCE PD.
See kubernetes/feature 561 and its doc PR 9939 (commit e1e6555)
See Storage / Storage Classes / Volume Binding Mode (beta in K8s 1.12)
By default, the Immediate mode indicates that volume binding and dynamic provisioning occurs once the PersistentVolumeClaim is created.
For storage backends that are topology-constrained and not globally accessible from all Nodes in the cluster, PersistentVolumes will be bound or provisioned without knowledge of the Pod’s scheduling requirements. This may result in unschedulable Pods.
Allowed Topologies is how to restrict the topology of provisioned volumes to specific zones.
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central1-a
- us-central1-b
Related
I have a Kubernetes v1.17.0 cluster with multiple nodes. I've created PVC with access mode set to RWO. From the Kubernetes docs:
ReadWriteOnce -- the volume can be mounted as read-write by a single node
I'm using a Cinder volume plugin which doesn't support ReadWriteMany.
When I create two different deployments that mount the same PVC Kubernetes sometimes deploys them on two different nodes which cause pods to fail.
Is this desired behaviour or is there a problem in my configuration?
As I gathered from your answers to the comments, you do not want to use affinity rules but want the scheduler to perform this work for you.
It seems that this issue has been known since at least 2016 but has not yet been resolved, as the scheduling is considered to be working as expected: https://github.com/kubernetes/kubernetes/issues/26567
You can read the details in the issue, but the core problem seems to be that in the definition of Kubernetes, a ReadWriteOnce volume can never be accessed by two Pods at the same time. By definition. What would need to be implemented is a flag saying "it is OK for this RWO volume to be accessed by two Pods at the same time, even though it is RWO". But this functionality has not been implemented yet.
In practice, you can typically work around this issue by using a Recreate Deployment Strategy: .spec.strategy.type: Recreate. Alternatively, use the affinity rules as described by the other answers.
The provisioning of PV/PVC and deployment of new pods, on the same node can only be achieved via node affinity. However, if you want Kubernetes to decide it for you will have to use inter-pod affinity.
However just to verify if you are doing everything the right way please refer this.
Persistent volumes in Kubernetes can be tied to a node or an availability zone because of the underlying hardware: A storage drive within a server, a SAN within a single datacenter cannot be moved around by the storage provisioner.
Now how does the storage provisioner know on which node or in which availability zone he needs to create the persistent volume? That's why persistent volume claims have a volume binding mode, which is set to WaitForFirstConsumer in that case. This means, the provisioning happens after the first pod that mounts the persistent volume has been scheduled. For more details, read here.
When a second pod is scheduled, it might run on another node or another availability zone unless you tell the scheduler to run the pod on the same node or in the same availability zone as the first pod by using inter-pod affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
# adjust the labels so that they identify your pod
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- myapp
# make pod run on the same node
topologyKey: kubernetes.io/hostname
I've been working with Kubernetes for quite a while, but still often got confused about Volume, PersistentVolume and PersistemtVolumeClaim. It would be nice if someone could briefly summarize the difference of them.
Volume - For a pod to reference a storage that is external , it needs volume spec. This volume can be from configmap, from secrets, from persistantvolumeclaim, from hostpath etc
PeristentVolume - It is representation of a storage that is made avaliable. The plugins for cloud provider enable to create this resource.
PeristentVolumeClaim - This claims specific resources and if the persistent volume is avaliable in namespaces match the claim requirement, then claim get tied to that Peristentvolume
At this point this PVC/PV aren't used. Then in Pod spec, pod makes use of claim as volumes and then the storage is attached to Pod
These are all in a Kubernetes application context. Too keep applications portable between different Kubernetes platforms, it is good to abstract away the infrastructure from the application. Here I will explain the Kubernetes objects that belongs to Application config and also to the Platform config. If your application runs on both e.g. GCP and AWS, you will need two sets of platform configs, one for GCP and one for AWS.
Application config
Volume
A pod may mount volumes. The source for volumes can be different things, e.g. a ConfigMap, Secret or a PersistentVolumeClaim
PersistentVolumeClaim
A PersistentVolumeClaim represents a claim of a specific PersistentVolume instance. For portability this claim can be for a specific StorageClass, e.g. SSD.
Platform config
StorageClass
A StorageClass represents PersistentVolume type with specific properties. It can be e.g. SSD. But the StorageClass is different on each platform, e.g. one definition on AWS, Azure, another on GCP or on Minikube.
PersistentVolume
This is a specific volume on the platform. And it may be different on platforms, e.g. awsElasticBlockStore or gcePersistentDisk. This is the instance that holds the actual data.
Minikube example
See Configure a Pod to Use a PersistentVolume for Storage for a full example on how to use PersistentVolume, StorageClass and Volume for a Pod using Minikube and a hostPath.
I create a Deployment with a volumeMount that references a PersistentVolumeClaim along with a memory request on a cluster with nodes in 3 difference AZs us-west-2a, us-west-2b, and us-west-2c.
The Deployment takes a while to start while the PersistentVolume is being dynamically created but they both eventually start up.
The problem I am running into is that the PersistentVolume is made in us-west-2c and the only node the pod can run on is already over allocated.
Is there a way for me to create the Deployment and claim such that the claim is not made in a region where no pod can start up?
I believe you're looking for Topology Awareness feature.
Topology Awareness
In Multi-Zone clusters, Pods can be spread across
Zones in a Region. Single-Zone storage backends should be provisioned
in the Zones where Pods are scheduled. This can be accomplished by
setting the Volume Binding Mode.
Kubernetes released topology-aware dynamic provisioning feature with kubernetes version 1.12, and I believe this will solve your issue.
Its mentioned in kubernetes official website as below for PV and PVC.
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., can be mounted once read/write or many times read-only).
who is adminstrator here? when they mention it in persistent volume perspective?
An administrator in this context is the admin of the cluster. Whomever is deploying the PV/PVC. (An operations engineer, system engineer, SysAdmin)
For example - an engineer can configure AWS Elastic File System to have space available in the Kubernetes cluster, then use a PV/PVC to make that available to a specific pod container in the cluster. This means that if the pod is destroyed for whatever reason, the data in the PVC persists and is available to other resources.
Using Kubernetes 1.7.0, the intention here is to be able to deploy MySQL / MongoDB / etc, and use local disk as storage backing; while webheads, and processing pods can autoscale by Kubernetes. To these aims, I've
Set up & deployed the Local Persistent Storage provisioner to automatically provision locally attached disk to pods' Persitent Volume Claims.
Manually created a Persistent Volume Claim, which succeeds, and the local volume is attached
Attempted to deploy MariaDB via helm by
helm install --name mysql --set persistence.storageClass=default stable/mariadb
This appears to succeed; but by going to the dashboard, I get
Storage node affinity check failed for volume "local-pv-8ef6e2af" : NodeSelectorTerm [{Key:kubernetes.io/hostname Operator:In Values:[kubemaster]}] does not match node labels
I suspect this might be due to helm's charts not including nodeaffinity. Other than updating each chart manually, is there a way to tell helm to deploy to the same pod where the provisioner has the volume?
Unfortunately, no. You will need to specify node affinity so that the Pod lands on the node where the local storage is located. See the docs on Node Affinity to know what do add to the helm chart.
I suspect it would look something like the following in your case.
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kubemaster
As an aside, this is something that will happen, not just at the node level, but at the zone level for cloud environments like AWS and GCP as well. In those environments, persistent disks are zonal and will require you to set NodeAffinity so that the Pods land in the zone with the persistent disk when deploying to a multi-zone cluster.
Also as an aside, It looks like your are deploying to the Kubernetes master? If so that may not be advisable since MySQL could potentially affect the master's operation.