Deploying to Kubernetes' Local Persistent Storage via Helm - kubernetes

Using Kubernetes 1.7.0, the intention here is to be able to deploy MySQL / MongoDB / etc, and use local disk as storage backing; while webheads, and processing pods can autoscale by Kubernetes. To these aims, I've
Set up & deployed the Local Persistent Storage provisioner to automatically provision locally attached disk to pods' Persitent Volume Claims.
Manually created a Persistent Volume Claim, which succeeds, and the local volume is attached
Attempted to deploy MariaDB via helm by
helm install --name mysql --set persistence.storageClass=default stable/mariadb
This appears to succeed; but by going to the dashboard, I get
Storage node affinity check failed for volume "local-pv-8ef6e2af" : NodeSelectorTerm [{Key:kubernetes.io/hostname Operator:In Values:[kubemaster]}] does not match node labels
I suspect this might be due to helm's charts not including nodeaffinity. Other than updating each chart manually, is there a way to tell helm to deploy to the same pod where the provisioner has the volume?

Unfortunately, no. You will need to specify node affinity so that the Pod lands on the node where the local storage is located. See the docs on Node Affinity to know what do add to the helm chart.
I suspect it would look something like the following in your case.
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- kubemaster
As an aside, this is something that will happen, not just at the node level, but at the zone level for cloud environments like AWS and GCP as well. In those environments, persistent disks are zonal and will require you to set NodeAffinity so that the Pods land in the zone with the persistent disk when deploying to a multi-zone cluster.
Also as an aside, It looks like your are deploying to the Kubernetes master? If so that may not be advisable since MySQL could potentially affect the master's operation.

Related

Kubernetes ignoring PVC RWO ACCESS MODE and deploying pods on different nodes

I have a Kubernetes v1.17.0 cluster with multiple nodes. I've created PVC with access mode set to RWO. From the Kubernetes docs:
ReadWriteOnce -- the volume can be mounted as read-write by a single node
I'm using a Cinder volume plugin which doesn't support ReadWriteMany.
When I create two different deployments that mount the same PVC Kubernetes sometimes deploys them on two different nodes which cause pods to fail.
Is this desired behaviour or is there a problem in my configuration?
As I gathered from your answers to the comments, you do not want to use affinity rules but want the scheduler to perform this work for you.
It seems that this issue has been known since at least 2016 but has not yet been resolved, as the scheduling is considered to be working as expected: https://github.com/kubernetes/kubernetes/issues/26567
You can read the details in the issue, but the core problem seems to be that in the definition of Kubernetes, a ReadWriteOnce volume can never be accessed by two Pods at the same time. By definition. What would need to be implemented is a flag saying "it is OK for this RWO volume to be accessed by two Pods at the same time, even though it is RWO". But this functionality has not been implemented yet.
In practice, you can typically work around this issue by using a Recreate Deployment Strategy: .spec.strategy.type: Recreate. Alternatively, use the affinity rules as described by the other answers.
The provisioning of PV/PVC and deployment of new pods, on the same node can only be achieved via node affinity. However, if you want Kubernetes to decide it for you will have to use inter-pod affinity.
However just to verify if you are doing everything the right way please refer this.
Persistent volumes in Kubernetes can be tied to a node or an availability zone because of the underlying hardware: A storage drive within a server, a SAN within a single datacenter cannot be moved around by the storage provisioner.
Now how does the storage provisioner know on which node or in which availability zone he needs to create the persistent volume? That's why persistent volume claims have a volume binding mode, which is set to WaitForFirstConsumer in that case. This means, the provisioning happens after the first pod that mounts the persistent volume has been scheduled. For more details, read here.
When a second pod is scheduled, it might run on another node or another availability zone unless you tell the scheduler to run the pod on the same node or in the same availability zone as the first pod by using inter-pod affinity:
podAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
# adjust the labels so that they identify your pod
matchExpressions:
- key: app.kubernetes.io/name
operator: In
values:
- myapp
# make pod run on the same node
topologyKey: kubernetes.io/hostname

Does stellar core deployment on k8s needs persistent storage?

I want to deploy stellar core on k8s with CATCHUP COMPLETE. I'm using this docker image satoshipay/stellar-core
In docker image docs mentioned /data used to store the some informations about DB. And I've seen that helm template is using a persistent volume and mounting it in /data.
I was wondering what will happen if I use a deployment instead of the stateful set and I restart the pod, update it's docker version or delete it? Does it initialize the DB again?
Also does the stellar core need any extra storage for the catchup?
Statefulset vs Deployment
A StatefulSet "provides guarantees about the ordering and uniqueness of these Pods".
If your application needs to be brought up in a specific order, use statefulset.
Storage
Definitely leverage a persistent volume for database. From K8S Docs
On-disk files in a Container are ephemeral
Since it appears you're deploying some kind of blockchain application, this could cause significant delays for startup
In Deployment you specify a PersistentVolumeClaim that is shared by all pod replicas. In other words, shared volume.
The backing storage obviously must have ReadWriteMany or ReadOnlyMany accessMode if you have more than one replica pod.
StatefulSet you specify a volumeClaimTemplates so that each replica pod gets a unique PersistentVolumeClaim associated with it.
In other words, no shared volume.
StatefulSet is useful for running things in cluster e.g Hadoop cluster, MySQL cluster, where each node has its own storage.
So in your case to have more isolation (no shared volumes) is better to have statefulset based solution.
If you use deployment based solution (restart the pod, update it's docker version or delete it) your DB will be initialized again.
Regarding catchup:
In general, running CATCHUP_COMPLETE=true is not recommended in docker containers as they have limited resources by default (if you really want to do it, make sure to give them access to more resources: CPU, memory and disk space).

Kubernetes local persistent volume cannot mount an existing path [duplicate]

I try, I try, but Rancher 2.1 fails to deploy the "mongo-replicaset" Catalog App, with Local Persistent Volumes configured.
How to correctly deploy a mongo-replicaset with Local Storage Volume? Any debugging techniques appreciated since I am new to rancher 2.
I follow the 4 ABCD steps bellow, but the first pod deployment never ends. What's wrong in it? Logs and result screens are at the end. Detailed configuration can be found here.
Note: Deployment without Local Persistent Volumes succeed.
Note: Deployment with Local Persistent Volume and with the "mongo" image succeed (without replicaset version).
Note: Deployment with both mongo-replicaset and with Local Persistent Volume fails.
Step A - Cluster
Create a rancher instance, and:
Add three nodes: a worker, a worker etcd, a worker control plane
Add a label on each node: name one, name two and name three for node Affinity
Step B - Storage class
Create a storage class with these parameters:
volumeBindingMode : WaitForFirstConsumer saw here
name : local-storage
Step C - Persistent Volumes
Add 3 persistent volumes like this:
type : local node path
Access Mode: Single Node RW, 12Gi
storage class: local-storage
Node Affinity: name one (two for second volume, three for third volume)
Step D - Mongo-replicaset Deployment
From catalog, select Mongo-replicaset and configure it like that:
replicaSetName: rs0
persistentVolume.enabled: true
persistentVolume.size: 12Gi
persistentVolume.storageClass: local-storage
Result
After doing ABCD steps, the newly created mongo-replicaset app stay infinitely in "Initializing" state.
The associated mongo workload contain only one pod, instead of three. And this pod has two 'crashed' containers, bootstrap and mongo-replicaset.
Logs
This is the output from the 4 containers of the only running pod. There is no error, no problem.
I can't figure out what's wrong with this configuration, and I don't have any tools or techniques to analyze the problem. Detailed configuration can be found here. Please ask me for more commands results.
Thanks you
All this configuration is correct.
It's missing a detail since Rancher is a containerized deployment of kubernetes.
Kubelets are deployed on each node in docker containers. They don't access to OS local folders.
It's needed to add a volume binding for the kubelets, like that K8s will be able to create the mongo pod with this same binding.
In rancher:
Edit the cluster yaml (Cluster > Edit > Edit as Yaml)
Add the following entry under "services" node:
kubelet:
extra_binds:
- "/mongo:/mongo:rshared"

Kubernetes / Rancher 2, mongo-replicaset with Local Storage Volume deployment

I try, I try, but Rancher 2.1 fails to deploy the "mongo-replicaset" Catalog App, with Local Persistent Volumes configured.
How to correctly deploy a mongo-replicaset with Local Storage Volume? Any debugging techniques appreciated since I am new to rancher 2.
I follow the 4 ABCD steps bellow, but the first pod deployment never ends. What's wrong in it? Logs and result screens are at the end. Detailed configuration can be found here.
Note: Deployment without Local Persistent Volumes succeed.
Note: Deployment with Local Persistent Volume and with the "mongo" image succeed (without replicaset version).
Note: Deployment with both mongo-replicaset and with Local Persistent Volume fails.
Step A - Cluster
Create a rancher instance, and:
Add three nodes: a worker, a worker etcd, a worker control plane
Add a label on each node: name one, name two and name three for node Affinity
Step B - Storage class
Create a storage class with these parameters:
volumeBindingMode : WaitForFirstConsumer saw here
name : local-storage
Step C - Persistent Volumes
Add 3 persistent volumes like this:
type : local node path
Access Mode: Single Node RW, 12Gi
storage class: local-storage
Node Affinity: name one (two for second volume, three for third volume)
Step D - Mongo-replicaset Deployment
From catalog, select Mongo-replicaset and configure it like that:
replicaSetName: rs0
persistentVolume.enabled: true
persistentVolume.size: 12Gi
persistentVolume.storageClass: local-storage
Result
After doing ABCD steps, the newly created mongo-replicaset app stay infinitely in "Initializing" state.
The associated mongo workload contain only one pod, instead of three. And this pod has two 'crashed' containers, bootstrap and mongo-replicaset.
Logs
This is the output from the 4 containers of the only running pod. There is no error, no problem.
I can't figure out what's wrong with this configuration, and I don't have any tools or techniques to analyze the problem. Detailed configuration can be found here. Please ask me for more commands results.
Thanks you
All this configuration is correct.
It's missing a detail since Rancher is a containerized deployment of kubernetes.
Kubelets are deployed on each node in docker containers. They don't access to OS local folders.
It's needed to add a volume binding for the kubelets, like that K8s will be able to create the mongo pod with this same binding.
In rancher:
Edit the cluster yaml (Cluster > Edit > Edit as Yaml)
Add the following entry under "services" node:
kubelet:
extra_binds:
- "/mongo:/mongo:rshared"

Deployment specific volumes in Kubernetes

Is it possible to create a volume that is shared between all pods in a deployment but impossible to mount for any other pod?
Alternatively that is read/write from one deployment and just read from any other pod?
That could be addressed in Kubernetes 1.12 (Q3 2018) with Topology aware dynamic provisioning, which is now in beta.
That means storage resources can now understand where they live.
This also includes beta support to AWS EBS and GCE PD.
See kubernetes/feature 561 and its doc PR 9939 (commit e1e6555)
See Storage / Storage Classes / Volume Binding Mode (beta in K8s 1.12)
By default, the Immediate mode indicates that volume binding and dynamic provisioning occurs once the PersistentVolumeClaim is created.
For storage backends that are topology-constrained and not globally accessible from all Nodes in the cluster, PersistentVolumes will be bound or provisioned without knowledge of the Pod’s scheduling requirements. This may result in unschedulable Pods.
Allowed Topologies is how to restrict the topology of provisioned volumes to specific zones.
allowedTopologies:
- matchLabelExpressions:
- key: failure-domain.beta.kubernetes.io/zone
values:
- us-central1-a
- us-central1-b