kubernetes PersistentVolume over google cloud storage bucket - kubernetes

I have created a persistent volume claim where I will store some ml model weights as follows:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: models-storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: model-storage-bucket
resources:
requests:
storage: 8Gi
However this configuration will provision a disk on a compute engine, and it is a bit cumbersome to copy stuff there and to upload/update any data. Would be so much more convenient if I could create a PersistentVolume abstracting a google cloud storage bucket. However, I couldn't find anywhere including the google documentation a way to do this. I am baffled because I would expect this to be a very common use case. Anyone knows how I can do that?
I was expecting to find something along the lines of
apiVersion: v1
kind: PersistentVolume
metadata:
name: test-volume
spec:
storageBucketPersistentDisk:
pdName: gs://my-gs-bucket

To mount cloud storage bucket you need to install Google Cloud Storage driver (NOT the persistent disk nor file store) on your cluster, create the StorageClass and then provision the bucket backed storage either dynamically or static; just as you would like using persistent disk or file store csi driver. Checkout the link for detailed steps.

Related

Initializing a dynamically provisioned shared volume with ReadOnlyMany access mode

My GKE deployment consists of N pods (possibly on different nodes) and a shared volume, which is dynamically provisioned by pd.csi.storage.gke.io and is a Persistent Disk in GCP. I need to initialize this disk with data before the pods go live.
My problem is I need to set accessModes to ReadOnlyMany and be able to mount it to all pods across different nodes in read-only mode, which I assume effectively would make it impossible to mount it in write mode to the initContainer.
Is there a solution to this issue? Answer to this question suggests a good solution for a case when each pod has their own disk mounted, but I need to have one disk shared among all pods since my data is quite large.
With GKE 1.21 and later, you can enable the managed Filestore CSI driver in your clusters. You can enable the driver for new clusters
gcloud container clusters create CLUSTER_NAME \
--addons=GcpFilestoreCsiDriver ...
or update existing clusters:
gcloud container clusters update CLUSTER_NAME \
--update-addons=GcpFilestoreCsiDriver=ENABLED
Once you've done that, create a storage class (or have or platform admin do it):
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: filestore-example
provisioner: filestore.csi.storage.gke.io
volumeBindingMode: WaitForFirstConsumer
allowVolumeExpansion: true
parameters:
tier: standard
network: default
After that, you can use PVCs and dynamic provisioning:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: podpvc
spec:
accessModes:
- ReadWriteMany
storageClassName: filestore-example
resources:
requests:
storage: 1Ti
...I need to have one disk shared among all pods
You can try Filestore. First your create a FileStore instance and save your data on a FileStore volume. Then you install FileStore driver on your cluster. Finally you share the data with pods that needs to read the data using a PersistentVolume referring the FileStore instance and volume above.

Is it possible to mount disk to gke pod and compute engine

Is it possible to mount disk to gke pod and compute engine at the same time.
I have a ubunut disk of 10 gb
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-demo
spec:
capacity:
storage: 10G
accessModes:
- ReadWriteOnce
claimRef:
name: pv-claim-demo
gcePersistentDisk:
pdName: pv-test1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim-demo
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10G
deploment.yaml
spec:
containers:
- image: wordpress
name: wordpress
ports:
- containerPort: 80
name: wordpress
volumeMounts:
- name: wordpress-persistent-storage
mountPath: /app/logs
volumes:
- name: wordpress-persistent-storage
persistentVolumeClaim:
claimName: pv-claim-demo
The idea is to mount the logs files generated by pod to disk and access it from compute engine.
I cannot use NFS or hostpath to solve the problem. The other challenge is multiple pod will be writting to same pv.
The other challenge is multiple pod will be writing to same PV.
Yes, this does not work well, unless you have a storage class similar to NFS. The default storageClass in Google Kubernetes Engine only support access mode ReadWriteOnce when dynamically provisioned - so only one replica can mount it.
The idea is to mount the logs files generated by pod to disk and access it from compute engine.
This is not a recommended solution for logs when using Kubernetes. An app on Kubernetes should follow the 12 factor principles, and for this problem there is a specific item about logs - the app should log to stdout. For apps that does not follow the 12 factor principles, this can be solved by a sidecar that tails the log files and then print them on stdout.
Logs that are printed to stdout is typically forwarded by the platform to a log collection system - as a service. So this is not anything the app developer need to be responsible for.
For how logs is handled by the platform in Google Kubernetes Engine, see Google Cloud Operations suite for GKE
You can't write many on Persistent disk. If you set your disk in read only, many can read on it (but not write, don't match your use case).
The only solution for this is to use NFS compliant storage. On Google Cloud, it's filestore service. It's exactly designed for your use case and you have tutorial for GKE
Better use the Google Cloud's operations suite for GKE (formerly known as StackDriver).
There would be two API, which can be used to access from GCE:
Cloud Monitoring
Cloud Logging

Do I have to explicitly create Persistent Volume when I am using Persistent Volume Claim?

I am new to Kubernetes, and I struggle to understand whol idea behind Persistent Storage in Kubernetes.
So is this enough or I have to create Persistent Volume and what will happen if I deploy only these two object without creating PV?
Storage should be on local machine.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
creationTimestamp: null
name: nginx-logs
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 100Mi
status: {}
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: app-web
name: app-web
spec:
selector:
matchLabels:
app: app-web
replicas: 1
strategy:
type: Recreate
template:
metadata:
labels:
app: app-web
spec:
containers:
image: nginx:1.14.2
imagePullPolicy: Always
name: app-web
volumeMounts:
- mountPath: /var/log/nginx
name: nginx-logs
restartPolicy: Always
volumes:
- name: nginx-logs
persistentVolumeClaim:
claimName: nginx-logs
I struggle to understand whole idea behind Persistent Storage in Kubernetes
The idea is to separate the storage request that the app needs, and the physical storage - such that an app can be moved to e.g. other cloud provider that has a different storage system - but without needing any changes in the app. It also separates the responsibility for "requesting storage" and managing the underlying storage e.g. developers vs operations.
So is this enough or I have to create Persistent Volume and what will happen if I deploy only these two object without creating PV?
This depends on you environment. Most environments typically have Dynamic Volume Provisioning, e.g. the big cloud providers and now also Minikube has support for this.
When using dynamic volume provisioning, the developer only has to create a PersistentVolumeClaim - and no PersistentVolume, its instead dynamically provsioned.
A PV which matches with the PVC in terms of capacity(100Mi) and accessModes(ReadWriteOnce)is needed.If you don't have a PV you get error pod has unbound immediate PersistentVolumeClaims. So either you create the PV manually(called static provisioning) or rely on storage class and volume driver to do it automatically(called dynamic provisioning).
A PV is a kubernetes representation of volume. The actual volume still need to be provisioned.If you use dynamic volume provisioning and your cloud provider has a volume driver the provisioning is automated internally by the driver without needing to create a PV manually.
In a local non cloud system you can use local path provisioner to use dynamic provisioning. Configure a default storage class and you can avoid manual PV creation.
You can see in the doc that it mentions a Dynamic way of provisioning persistent volumes.
When none of the static PVs the administrator created match a user's
PersistentVolumeClaim, the cluster may try to dynamically provision a
volume specially for the PVC. This provisioning is based on
StorageClasses: the PVC must request a storage class and the
administrator must have created and configured that class for dynamic
provisioning to occur. Claims that request the class "" effectively
disable dynamic provisioning for themselves.
When I apply your pvc, it creates a gp2 volume because that is my default storage class.
$kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
nginx-logs Bound pvc-b95a9d0c-ef46-4ff0-a034-d2dde1ac1f96 1Gi RWO gp2 6s
You can see your default storage class like this
$kubectl get storageclass
NAME PROVISIONER AGE
gp2 (default) kubernetes.io/aws-ebs 355d
You can also include a specific storage class. You can read more about that here.
Users request dynamically provisioned storage by including a storage
class in their PersistentVolumeClaim
In short, your current pvc file will use your default storage class to create a Persistent Volume and then you are mounting that volume into your pod via your Persistent Volume Claim in your Deployment.

How to store my pod logs in a persistent storage?

I have generated logs for my pods using kubectl logs 'pod name. But I want to persist these logs in a volume (some kind of persistent storage), because container logs will get wiped out if the pods go down. Is there a way to do this? Do I have to write some sort of a script?
I have read many answers but I still do not understand how to go about it, any help is appreciated. Thanks!
Under Logging Architecture Kubernetes documents goes thru couple of way to set up loggin in your cluster.
The most interesting for you might be Cluster-level logging architecture:
While Kubernetes does not provide a native solution for cluster-level
logging, there are several common approaches you can consider. Here
are some options:
Use a node-level logging agent that runs on every node.
Include a dedicated sidecar container for logging in an application pod.
Push logs directly to a backend from within an application
There are many solutions for collecting pod logs and shipping them to a centralized location such as:
fluentd
splunk
elastic
Keeping logs outside of cluster has benefits. If you cluster begins to have issues its more likely that your inside logging architecure will also face them.
You will need to mount the logs directory inside the container to the host machine as well, using the PersistentVolume and PersistentVolumeClaim.
This way you can persist these logs even if the container is killed.
Create the PersistentVolume and PersistentVolumeClaim for the log path and use them as volume mounts to the kubernetes deployments or pods.
I know this is an old question, but I've just had the same problem and I've spent some time to figure out the solution, so I'd like to share a more detailed solution.
Like Aayush Mall said, you'll need the PersistentVolume and PersistentVolumeClaim objects to create the volume and then link it to the pod (preferably via a Deployment object).
Basically, The PersistentVolume would define how and where the volume would be stored in the host and the PersistentVolumeClaim would define the constraints to bind the volume to some container.
From the docs:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see AccessModes).
So, let's say your pods are running in two nodes: mynode-1 and mynode-2.
Your PersistentVolume spec will look like this.
apiVersion: v1
kind: PersistentVolume
metadata:
name: myapp-log-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /var/log/myapp
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- mynode-1
- mynode-2
Your PersistentVolumeClaim like this.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myapp-log-pvc
spec:
volumeMode: Filesystem
accessModes:
- ReadWriteMany
storageClassName: local-storage
resources:
requests:
storage: 2Gi
volumeName: myapp-log
And then, you just have to tell the deployment object how to mount the volume inside the container. So, your Deployment spec will look like this.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy
spec:
selector:
matchLabels:
app: myapp
replicas: 1
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myrepo/myapp:latest
volumeMounts:
- name: log
mountPath: /var/log
volumes:
- name: log
persistentVolumeClaim:
claimName: myapp-log-pvc
And that's it. When your deployment starts, it'll create the pod with the container, mount a volume named log for the path /var/log (inside the container) and bound this volume to some PV matching the requirements of the PVC named myapp-log-pvc. As we've created the myapp-log-pv with the same volumeMode, accessModes and storageClassName fields and with more storage capacity then the required by myapp-log-pvc, they will be bound. So, your app logs will be stored in the path /var/log/myapp (field spec.local.path in the myapp-log-pv spec) inside the node running the pod.
I hope it help :)
Also, I'm kinda new in the kubernetes world, so please let me know if you notice I misunderstood something or if there is a better way to do this.

Kubernetes Persistent Volume Claim Indefinitely in Pending State

I created a PersistentVolume sourced from a Google Compute Engine persistent disk that I already formatted and provision with data. Kubernetes says the PersistentVolume is available.
kind: PersistentVolume
apiVersion: v1
metadata:
name: models-1-0-0
labels:
name: models-1-0-0
spec:
capacity:
storage: 200Gi
accessModes:
- ReadOnlyMany
gcePersistentDisk:
pdName: models-1-0-0
fsType: ext4
readOnly: true
I then created a PersistentVolumeClaim so that I could attach this volume to multiple pods across multiple nodes. However, kubernetes indefinitely says it is in a pending state.
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: models-1-0-0-claim
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 200Gi
selector:
matchLabels:
name: models-1-0-0
Any insights? I feel there may be something wrong with the selector...
Is it even possible to preconfigure a persistent disk with data and have pods across multiple nodes all be able to read from it?
I quickly realized that PersistentVolumeClaim defaults the storageClassName field to standard when not specified. However, when creating a PersistentVolume, storageClassName does not have a default, so the selector doesn't find a match.
The following worked for me:
kind: PersistentVolume
apiVersion: v1
metadata:
name: models-1-0-0
labels:
name: models-1-0-0
spec:
capacity:
storage: 200Gi
storageClassName: standard
accessModes:
- ReadOnlyMany
gcePersistentDisk:
pdName: models-1-0-0
fsType: ext4
readOnly: true
---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: models-1-0-0-claim
spec:
accessModes:
- ReadOnlyMany
resources:
requests:
storage: 200Gi
selector:
matchLabels:
name: models-1-0-0
With dynamic provisioning, you shouldn't have to create PVs and PVCs separately. In Kubernetes 1.6+, there are default provisioners for GKE and some other cloud environments, which should let you just create a PVC and have it automatically provision a PV and an underlying Persistent Disk for you.
For more on dynamic provisioning, see:
https://kubernetes.io/blog/2017/03/dynamic-provisioning-and-storage-classes-kubernetes/
Had the same issue but it was another reason that's why I am sharing it here to help community.
If you have deleted PersistentVolumeClaim and then re-create it again with the same definition, it will be Pending forever, why?
persistentVolumeReclaimPolicy is Retain by default in PersistentVolume. In case we have deleted PersistentVolumeClaim, the PersistentVolume still exists and the volume is considered released. But it is not yet available for another claim because the previous claimant's data remains on the volume.
so you need to manually reclaim the volume with the following steps:
Delete the PersistentVolume (associated underlying storage asset/resource like EBS, GCE PD, Azure Disk, ...etc will NOT be deleted, still exists)
(Optional) Manually clean up the data on the associated storage asset accordingly
(Optional) Manually delete the associated storage asset (EBS, GCE PD, Azure Disk, ...etc)
If you still need the same data, you may skip cleaning and deleting associated storage asset (step 2 and 3 above), so just simply re-create a new PersistentVolume with same storage asset definition then you should be good to create PersistentVolumeClaim again.
One last thing to mention, Retain is not the only option for persistentVolumeReclaimPolicy, below are some other options that you may need to use or try based on use-case scenarios:
Recycle: performs a basic scrub on the volume (e.g., rm -rf //*) - makes it available again for a new claim. Only NFS and HostPath support recycling.
Delete: Associated storage asset such as AWS EBS, GCE PD, Azure Disk, or OpenStack Cinder...etc volume is deleted
For more information, please check kubernetes documentation.
Still need more clarification or have any questions, please don't hesitate to leave a comment and I will be more than happy to clarify and assist.
If you're using Microk8s, you have to enable storage before you can start a PersistentVolumeClaim successfully.
Just do:
microk8s.enable storage
You'll need to delete your deployment and start again.
You may also need to manually delete the "pending" PersistentVolumeClaims because I found that uninstalling the Helm chart which created them didn't clear the PVCs out.
You can do this by first finding a list of names:
kubectl get pvc --all-namespaces
then deleting each name with:
kubectl delete pvc name1 name2 etc...
Once storage is enabled, reapplying your deployment should get things going.
I was facing the same problem, and realise that k8s actually does a just-in-time provision, i.e.
When a pvc is created, it stays in PENDING state, and no corresponding pv is created.
The pvc & pv (EBS volume) are created only after you have created a deployment which uses the pvc.
I am using EKS with kubernetes version 1.16 and the behaviour is controlled by StorageClass Volume Binding Mode.
I had same problem. My PersistentVolumeClaim yaml was originally as follows:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc
spec:
storageClassName: “”
accessModes:
– ReadWriteOnce 
volumeName: pv
resources:
requests:
storage: 1Gi
and my pvc status was:
after remove volumeName :
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc
spec:
storageClassName: “”
accessModes:
– ReadWriteOnce 
resources:
requests:
storage: 1Gi
I've seen this behaviour in microk8s 1.14.1 when two PersistentVolumes have the same value for spec/hostPath/path, e.g.
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv-name
labels:
type: local
app: app
spec:
storageClassName: standard
capacity:
storage: 5Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/mnt/k8s-app-data"
It seems that microk8s is event-based (which isn't necessary on a one-node cluster) and throws away information about any failing operations resulting in unnecessary horrible feedback for almost all failures.
I had this problem with helmchart of the apache airflow(stable), setting storageClass to azurefile helped. What you should do in such cases with the cloud providers? Just search for the storage classes that support the needed access mode. ReadWriteMany means that SIMULTANEOUSLY many processes will read and write to the storage. In this case(azure) it is azurefile.
path: /opt/airflow/logs
## configs for the logs PVC
##
persistence:
## if a persistent volume is mounted at `logs.path`
##
enabled: true
## the name of an existing PVC to use
##
existingClaim: ""
## sub-path under `logs.persistence.existingClaim` to use
##
subPath: ""
## the name of the StorageClass used by the PVC
##
## NOTE:
## - if set to "", then `PersistentVolumeClaim/spec.storageClassName` is omitted
## - if set to "-", then `PersistentVolumeClaim/spec.storageClassName` is set to ""
##
storageClass: "azurefile"
## the access mode of the PVC
##
## WARNING:
## - must be: `ReadWriteMany`
##
## NOTE:
## - different StorageClass support different access modes:
## https://kubernetes.io/docs/concepts/storage/persistent-volumes/#access-modes
##
accessMode: ReadWriteMany
## the size of PVC to request
##
size: 1Gi
When you want to bind manually a PVC to a PV with an existing disk, the storageClassName should not be specified... but... the cloud provider has set by default the "standard" StorageClass making it always entered whatever you try when patching the PVC/PV.
You can check your provider set it as default when doing kubectl get storageclass (it will be written "(default")).
To fix this the best is to get your existing StorageClass YAML and add this annotation:
annotations:
storageclass.kubernetes.io/is-default-class: "false"
Apply and good :)
Am using microk8s
Fixed the problem by running the commands below
systemctl start open-iscsi.service
(had open-iscsi installed earlier using apt install open-iscsi but had not started it)
Then enabled storage as follows
microk8s.enable storage
Then, deleted the Stateful Sets and the pending Persistence Volume Claims from Lens so I can start over.
Worked well after that.
I faced the same issue in which the PersistentVolumeClaim was in Pending Phase indefinitely, I tried providing the storageClassName as 'default' in PersistentVolume just like I did for PersistentVolumeClaim but it did not fix this issue.
I made one change in my persistentvolume.yml and moved the PersistentVolumeClaim config on top of the file and then PersistentVolume as the second config in the yml file. It has fixed that issue.
We need to make sure that PersistentVolumeClaim is created first and the PersistentVolume is created afterwards to resolve this 'Pending' phase issue.
I am posting this answer after testing it for a few times, hoping that it might help someone struggling with it.
Make sure your VM also has enough disk space.