Kubernetes trouble with StatefulSet and 3 PersistentVolumes - kubernetes

I'm in the process of creating a StatefulSet based on this yaml, that will have 3 replicas. I want each of the 3 pods to connect to a different PersistentVolume.
For the persistent volume I'm using 3 objects that look like this, with only the name changed (pvvolume, pvvolume2, pvvolume3):
kind: PersistentVolume
apiVersion: v1
metadata:
name: pvvolume
labels:
type: local
spec:
storageClassName: standard
capacity:
storage: 10Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/nfs"
claimRef:
kind: PersistentVolumeClaim
namespace: default
name: mongo-persistent-storage-mongo-0
The first of the 3 pods in the StatefulSet seems to be created without issue.
The second fails with the error pod has unbound PersistentVolumeClaims
Back-off restarting failed container.
Yet if I go to the tab showing PersistentVolumeClaims the second one that was created seems to have been successful.
If it was successful why does the pod think it failed?

I want each of the 3 pods to connect to a different PersistentVolume.
For that to work properly you will either need:
provisioner (in link you posted there are example how to set provisioner on aws, azure, googlecloud and minicube) or
volume capable of being mounted multiple times (such as nfs volume). Note however that in such a case all your pods read/write to the same folder and this can lead to issues when they are not meant to lock/write to same data concurrently. Usual use case for this is upload folder that pods are saving to, that is later used for reading only and such use cases. SQL Databases (such as mysql) on the other hand, are not meant to write to such shared folder.
Instead of either of mentioned requirements in your claim manifest you are using hostPath (pointing to /nfs) and set it to ReadWriteOnce (only one can use it). You are also using 'standard' as storage class and in url you gave there are fast and slow ones, so you probably created your storage class as well.
The second fails with the error pod has unbound PersistentVolumeClaims
Back-off restarting failed container
That is because first pod already took it's claim (read write once, host path) and second pod can't reuse same one if proper provisioner or access is not set up.
If it was successful why does the pod think it failed?
All PVC were successfully bound to accompanying PV. But you are never bounding second and third PVC to second or third pods. You are retrying with first claim on second pod, and first claim is already bound (to fist pod) in ReadWriteOnce mode and can't be bound to second pod as well and you are getting error...
Suggested approach
Since you reference /nfs as your host path, it may be safe to assume that you are using some kind of NFS-backed file system so here is one alternative setup that can get you to mount dynamically provisioned persistent volumes over nfs to as many pods in stateful set as you want
Notes:
This only answers original question of mounting persistent volumes across stateful set replicated pods with the assumption of nfs sharing.
NFS is not really advisable for dynamic data such as database. Usual use case is upload folder or moderate logging/backing up folder. Database (sql or no sql) is usually a no-no for nfs.
For mission/time critical applications you might want to time/stresstest carefully prior to taking this approach in production since both k8s and external pv are adding some layers/latency in-between. Although for some application this might suffice, be warned about it.
You have limited control of name for pv that are being dynamically created (k8s adds suffix to newly created, and reuses available old ones if told to do so), but k8s will keep them after pod get terminated and assign first available to new pod so you won't loose state/data. This is something you can control with policies though.
Steps:
for this to work you will first need to install nfs provisioner from here:
https://github.com/kubernetes-incubator/external-storage/tree/master/nfs. Mind you that installation is not complicated but has some steps where you have to take careful approach (permissions, setting up nfs shares etc) so it is not just fire-and-forget deployment. Take your time installing nfs provisioner correctly. Once this is properly set up you can continue with suggested manifests below:
Storage class manifest:
kind: StorageClass
apiVersion: storage.k8s.io/v1beta1
metadata:
name: sc-nfs-persistent-volume
# if you changed this during provisioner installation, update also here
provisioner: example.com/nfs
Stateful Set (important excerpt only):
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ss-my-app
spec:
replicas: 3
...
selector:
matchLabels:
app: my-app
tier: my-mongo-db
...
template:
metadata:
labels:
app: my-app
tier: my-mongo-db
spec:
...
containers:
- image: ...
...
volumeMounts:
- name: persistent-storage-mount
mountPath: /wherever/on/container/you/want/it/mounted
...
...
volumeClaimTemplates:
- metadata:
name: persistent-storage-mount
spec:
storageClassName: sc-nfs-persistent-volume
accessModes: [ ReadWriteOnce ]
resources:
requests:
storage: 10Gi
...

Related

Reuse PV in Deployment

What I need?
A deployment with 2 PODs which read from the SAME volume (PV). The volume must be shared between PODS in a RW mode.
Note: I already have a rook ceph with a defined storageClass "rook-cephfs" which allow this capability. This SC also has Retain Policy
This is what I did:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: data-nginx
spec:
accessModes:
- "ReadWriteMany"
resources:
requests:
storage: "10Gi"
storageClassName: "rook-cephfs"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
serviceAccountName: default
containers:
- name: nginx
image: nginx:latest
imagePullPolicy: Always
ports:
- name: http
containerPort: 80
volumeMounts:
- name: pvc-data
mountPath: /data
volumes:
- name: pvc-data
persistentVolumeClaim:
claimName: data-nginx
It works! Both nginx containers shares the volume.
Problem:
If a delete all the resources (except the PV) and a recreate them, a NEW PV is created instead of reuse the old one. So basically, the new volume is empty.
The OLD PV get the status "Released" instead of "Available"
I realized that if a apply a patch to the PV to remove the claimRef.uid :
kubectl patch pv $PV_NAME --type json -p '[{"op": "remove", "path": "/spec/claimRef/uid"}]'
and then redeploy it works.
But I don't want to do this manual step. I need this automated.
I also tried the same configuration with a statefulSet and got the same problem.
Any solution?
Make sure to use reclaimPolicy: Retain in your StorageClass. It will tell Kubernetes to reuse the PV.
Ref: https://kubernetes.io/docs/tasks/administer-cluster/change-pv-reclaim-policy/
But I don't want to do this manual step. I need this automated.
Based on the official documentation, it is unfortunately impossible. First look at the Reclaim Policy:
PersistentVolumes that are dynamically created by a StorageClass will have the reclaim policy specified in the reclaimPolicy field of the class, which can be either Delete or Retain. If no reclaimPolicy is specified when a StorageClass object is created, it will default to Delete.
So, we have 2 supported options for Reclaim Policy: Delete or Retain.
Delete option is not for you, because,
for volume plugins that support the Delete reclaim policy, deletion removes both the PersistentVolume object from Kubernetes, as well as the associated storage asset in the external infrastructure, such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume. Volumes that were dynamically provisioned inherit the reclaim policy of their StorageClass, which defaults to Delete. The administrator should configure the StorageClass according to users' expectations; otherwise, the PV must be edited or patched after it is created.
Retain option allows you for manual reclamation of the resource:
When the PersistentVolumeClaim is deleted, the PersistentVolume still exists and the volume is considered "released". But it is not yet available for another claim because the previous claimant's data remains on the volume. An administrator can manually reclaim the volume with the following steps.
Delete the PersistentVolume. The associated storage asset in external infrastructure (such as an AWS EBS, GCE PD, Azure Disk, or Cinder volume) still exists after the PV is deleted.
Manually clean up the data on the associated storage asset accordingly.
Manually delete the associated storage asset, or if you want to reuse the same storage asset, create a new PersistentVolume with the storage asset definition.

Kubernetes: Using an ordinal number in a claimName?

I have a statefulset that is running great and the stateful set has ReadWriteMany PVC. I need to share this PVC with another statefulset.
Does anybody know how I can add the ordinal number into the claimName.
Basically I have a backendService that is a statefulset with 2 replicas so it has a volumeClaimTemplate defined - hence it has 2 volumes service-data-service-0 and service-data-service-1 for example.
In the other statefulset - it has its own data volume but I need to share the data volume from the other statefulset.
There is a one to one mapping - meaning that the volume with ordinal 0 in the lower service needs to be added to pod0 and the same for volume with ordinal 1 to pod1.
I am little confused how I am able to do this. Its easy with a deployment, because technically you have 2 x deployments.. SO each deployment can be strictly sent to the correct service-data-service- XX (Where XX is the ordinal number of the lower server i.e 0,1 etc)
In my head, psuedo code - I have this. Can anyone help ?
volumes:
- name: lnd2-data-volume
persistentVolumeClaim:
# This volumes section is in the higher service but shares a data volume
# with the lower service
claimName: service-data-service-{{ "SOME TEMPLATE HERE to give me either 0 or 1 for the current POD ordinal number }}
Any ideas ?
To see TLDR version please go to the solution below.
What you are trying to achieve is not doable in Statefulset (STS) today.
Claims due to the design of StatefulSet controller need to have a unique identifiers, in order to be mapped to their corresponding pods, and cannot be reused between different StatefulSet applications.
So no matter, what claim name you specify within Volumes as part of Pod's template inside StatefulSet definition (e.g. claimName=service-data-service-0), it will be always overwritten by StatefulSet controller for each controlled by it Pod using the following naming scheme:
PVC name = claim.Name + set.Name + ordinal number
where:
claim.Name - claim on the list of STS's volumeClaimTemplates matching 'volumeMount' in PodTemplateSpec
set.Name - StatefulSet name
ordinal - Pod's (replicas - 1)
My observations:
The existing PVC (of ReadWriteMany mode) can be used by STS, only when you introduce the StatefulSet for the first time in your cluster (=it's not owned yet by other workload).
For example, the STS like this one:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: peb
spec:
...
volumeClaimTemplates:
- metadata:
name: fileserver-claim
spec:
accessModes: [ "ReadWriteMany" ]
storageClassName: ""
resources:
requests:
storage: 1Gi
would consume the existing PVC:
fileserver-claim-peb-0
with accompanying event seen in API server logs:
The PVC 'fileserver-claim-peb-0' already exists
and because there cannot be any different STS of the same name (Pod 'peb-0' is unique in the cluster likewise its claimName), your options are over here.
Solution:
Pre-provision manually couple of PVs, that use the same associated storage asset (e.g NFS based volumes supporting RWX access mode) and inside your STS on the list of PVCs reference by name (volumeName) the existing unbound PV, e.g:
...
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes:
- "ReadWriteOnce"
volumeName: fileserver-claim-peb
resources:
requests:
storage: 1Gi
I think this is a recipe to share the same data storage between different StatefulSet(s).

How to store my pod logs in a persistent storage?

I have generated logs for my pods using kubectl logs 'pod name. But I want to persist these logs in a volume (some kind of persistent storage), because container logs will get wiped out if the pods go down. Is there a way to do this? Do I have to write some sort of a script?
I have read many answers but I still do not understand how to go about it, any help is appreciated. Thanks!
Under Logging Architecture Kubernetes documents goes thru couple of way to set up loggin in your cluster.
The most interesting for you might be Cluster-level logging architecture:
While Kubernetes does not provide a native solution for cluster-level
logging, there are several common approaches you can consider. Here
are some options:
Use a node-level logging agent that runs on every node.
Include a dedicated sidecar container for logging in an application pod.
Push logs directly to a backend from within an application
There are many solutions for collecting pod logs and shipping them to a centralized location such as:
fluentd
splunk
elastic
Keeping logs outside of cluster has benefits. If you cluster begins to have issues its more likely that your inside logging architecure will also face them.
You will need to mount the logs directory inside the container to the host machine as well, using the PersistentVolume and PersistentVolumeClaim.
This way you can persist these logs even if the container is killed.
Create the PersistentVolume and PersistentVolumeClaim for the log path and use them as volume mounts to the kubernetes deployments or pods.
I know this is an old question, but I've just had the same problem and I've spent some time to figure out the solution, so I'd like to share a more detailed solution.
Like Aayush Mall said, you'll need the PersistentVolume and PersistentVolumeClaim objects to create the volume and then link it to the pod (preferably via a Deployment object).
Basically, The PersistentVolume would define how and where the volume would be stored in the host and the PersistentVolumeClaim would define the constraints to bind the volume to some container.
From the docs:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see AccessModes).
So, let's say your pods are running in two nodes: mynode-1 and mynode-2.
Your PersistentVolume spec will look like this.
apiVersion: v1
kind: PersistentVolume
metadata:
name: myapp-log-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /var/log/myapp
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- mynode-1
- mynode-2
Your PersistentVolumeClaim like this.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myapp-log-pvc
spec:
volumeMode: Filesystem
accessModes:
- ReadWriteMany
storageClassName: local-storage
resources:
requests:
storage: 2Gi
volumeName: myapp-log
And then, you just have to tell the deployment object how to mount the volume inside the container. So, your Deployment spec will look like this.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy
spec:
selector:
matchLabels:
app: myapp
replicas: 1
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myrepo/myapp:latest
volumeMounts:
- name: log
mountPath: /var/log
volumes:
- name: log
persistentVolumeClaim:
claimName: myapp-log-pvc
And that's it. When your deployment starts, it'll create the pod with the container, mount a volume named log for the path /var/log (inside the container) and bound this volume to some PV matching the requirements of the PVC named myapp-log-pvc. As we've created the myapp-log-pv with the same volumeMode, accessModes and storageClassName fields and with more storage capacity then the required by myapp-log-pvc, they will be bound. So, your app logs will be stored in the path /var/log/myapp (field spec.local.path in the myapp-log-pv spec) inside the node running the pod.
I hope it help :)
Also, I'm kinda new in the kubernetes world, so please let me know if you notice I misunderstood something or if there is a better way to do this.

How do I mount and format new google compute disk to be mounted in a GKE pod?

I have created a new disk in Google Compute Engine.
gcloud compute disks create --size=10GB --zone=us-central1-a dane-disk
It says I need to format it. But I have no idea how could I mount/access the disk?
gcloud compute disks list
NAME LOCATION LOCATION_SCOPE SIZE_GB TYPE STATUS
notowania-disk us-central1-a zone 10 pd-standard READY
New disks are unformatted. You must format and mount a disk before it
can be used. You can find instructions on how to do this at:
https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting
I tried instruction above but lsblk is not showing the disk at all
Do I need to create a VM and somehow attach it to it in order to use it? My goal was to mount the disk as a persistent GKE volume independent of the VM (last time GKE upgrade caused recreation of VM and data loss)
Thanks for the clarification of what you are trying to do in the comments.
I have 2 different answers here.
The first is that my testing shows that the Kubernetes GCE PD documentation is exactly right, and the warning about formatting seems like it can be safely ignored.
If you just issue:
gcloud compute disks create --size=10GB --zone=us-central1-a my-test-data-disk
And then use it in a pod:
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: nginx
name: nginx-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
# This GCE PD must already exist.
gcePersistentDisk:
pdName: my-test-data-disk
fsType: ext4
It will be formatted when it is mounted. This is likely because the fsType parameter instructs the system how to format the disk. You don't need to do anything with a separate GCE instance. The disk is retained even if you delete the pod or even the entire cluster. It is not reformatted on uses after the first and the data is kept around.
So, the warning message from gcloud is confusing, but can be safely ignored in this case.
Now, in order to dynamically create a persistent volume based on GCE PD that isn't automatically deleted, you will need to create a new StorageClass that sets the Reclaim Policy to Retain, and then create a PersistentVolumeClaim based on that StorageClass. This also keeps basically the entire operation inside of Kubernetes, without needing to do anything with gcloud. Likewise, a similar approach is what you would want to use with a StatefulSet as opposed to a single pod, as described here.
Most of what you are looking to do is described in this GKE documentation about dynamically allocating PVCs as well as the Kubernetes StorageClass documentation. Here's an example:
gce-pd-retain-storageclass.yaml:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gce-pd-retained
reclaimPolicy: Retain
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
replication-type: none
The above storage class is basically the same as the 'standard' GKE storage class, except with the reclaimPolicy set to Retain.
pvc-demo.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-demo-disk
spec:
accessModes:
- ReadWriteOnce
storageClassName: gce-pd-retained
resources:
requests:
storage: 10Gi
Applying the above will dynamically create a disk that will be retained when you delete the claim.
And finally a demo-pod.yaml that mounts the PVC as a volume (this is really a nonsense example using nginx, but it demonstrates the syntax):
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: nginx
name: nginx-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: pvc-demo-disk
Now, if you apply these three in order, you'll get a container running using the PersistentVolumeClaim which has automatically created (and formatted) a disk for you. When you delete the pod, the claim keeps the disk around. If you delete the claim the StorageClass still keeps the disk from being deleted.
Note that the PV that is left around after this won't be automatically reused, as the data is still on the disk. See the Kubernetes documentation about what you can do to reclaim it in this case. Really, this mostly says that you shouldn't delete the PVC unless you're ready to do work to move the data off the old volume.
Note that these disks will even continue to exist when the entire GKE cluster is deleted as well (and you will continue to be billed for them until you delete them).

Can I rely on volumeClaimTemplates naming convention?

I want to setup a pre-defined PostgreSQL cluster in a bare meta kubernetes 1.7 with local PV enable. I have three work nodes. I create local PV on each node and deploy the stateful set successfully (with some complex script to setup Postgres replication).
However I'm noticed that there's a kind of naming convention between the volumeClaimTemplates and PersistentVolumeClaim.
For example
apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
name: postgres
volumeClaimTemplates:
- metadata:
name: pgvolume
The created pvc are pgvolume-postgres-0, pgvolume-postgres-1, pgvolume-postgres-2 .
With some tricky, I manually create PVC and bind to the target PV by selector. I test the stateful set again. It seems the stateful set is very happy to use these PVC.
I finish my test successfully but I still have this question. Can I rely on volumeClaimTemplates naming convention? Is this an undocumented feature?
Based on the statefulset API reference
volumeClaimTemplates is a list of claims that pods are allowed to reference. The StatefulSet controller is responsible for mapping network identities to claims in a way that maintains the identity of a pod. Every claim in this list must have at least one matching (by name) volumeMount in one container in the template. A claim in this list takes precedence over any volumes in the template, with the same name.
So I guess you can rely on it.
Moreover, you can define a storage class to leverage dynamic provisioning of persistent volumes, so you won't have to create them manually.
volumeClaimTemplates:
- metadata:
name: www
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: my-storage-class
resources:
requests:
storage: 1Gi
Please refer to Dynamic Provisioning and Storage Classes in Kubernetes for more details.