Is it possible to mount disk to gke pod and compute engine - kubernetes

Is it possible to mount disk to gke pod and compute engine at the same time.
I have a ubunut disk of 10 gb
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-demo
spec:
capacity:
storage: 10G
accessModes:
- ReadWriteOnce
claimRef:
name: pv-claim-demo
gcePersistentDisk:
pdName: pv-test1
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pv-claim-demo
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10G
deploment.yaml
spec:
containers:
- image: wordpress
name: wordpress
ports:
- containerPort: 80
name: wordpress
volumeMounts:
- name: wordpress-persistent-storage
mountPath: /app/logs
volumes:
- name: wordpress-persistent-storage
persistentVolumeClaim:
claimName: pv-claim-demo
The idea is to mount the logs files generated by pod to disk and access it from compute engine.
I cannot use NFS or hostpath to solve the problem. The other challenge is multiple pod will be writting to same pv.

The other challenge is multiple pod will be writing to same PV.
Yes, this does not work well, unless you have a storage class similar to NFS. The default storageClass in Google Kubernetes Engine only support access mode ReadWriteOnce when dynamically provisioned - so only one replica can mount it.
The idea is to mount the logs files generated by pod to disk and access it from compute engine.
This is not a recommended solution for logs when using Kubernetes. An app on Kubernetes should follow the 12 factor principles, and for this problem there is a specific item about logs - the app should log to stdout. For apps that does not follow the 12 factor principles, this can be solved by a sidecar that tails the log files and then print them on stdout.
Logs that are printed to stdout is typically forwarded by the platform to a log collection system - as a service. So this is not anything the app developer need to be responsible for.
For how logs is handled by the platform in Google Kubernetes Engine, see Google Cloud Operations suite for GKE

You can't write many on Persistent disk. If you set your disk in read only, many can read on it (but not write, don't match your use case).
The only solution for this is to use NFS compliant storage. On Google Cloud, it's filestore service. It's exactly designed for your use case and you have tutorial for GKE

Better use the Google Cloud's operations suite for GKE (formerly known as StackDriver).
There would be two API, which can be used to access from GCE:
Cloud Monitoring
Cloud Logging

Related

kubernetes PersistentVolume over google cloud storage bucket

I have created a persistent volume claim where I will store some ml model weights as follows:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: models-storage
spec:
accessModes:
- ReadWriteOnce
storageClassName: model-storage-bucket
resources:
requests:
storage: 8Gi
However this configuration will provision a disk on a compute engine, and it is a bit cumbersome to copy stuff there and to upload/update any data. Would be so much more convenient if I could create a PersistentVolume abstracting a google cloud storage bucket. However, I couldn't find anywhere including the google documentation a way to do this. I am baffled because I would expect this to be a very common use case. Anyone knows how I can do that?
I was expecting to find something along the lines of
apiVersion: v1
kind: PersistentVolume
metadata:
name: test-volume
spec:
storageBucketPersistentDisk:
pdName: gs://my-gs-bucket
To mount cloud storage bucket you need to install Google Cloud Storage driver (NOT the persistent disk nor file store) on your cluster, create the StorageClass and then provision the bucket backed storage either dynamically or static; just as you would like using persistent disk or file store csi driver. Checkout the link for detailed steps.

Why is my Host Path Persistent Volume reachable from all pods?

I'm pretty stuck with this learning step of Kubernetes named PV and PVC.
What I'm trying to do here is understand how to handle shared read-write volume on multiple pods.
What I understood here is that a PVC cannot be shared between pods unless a NFS-like storage class has been configured.
I'm still with my hostPath Storage Class and I tried the following (Docker Desktop and 3 nodes microK8s cluster) :
This PVC with dynamic Host Path provisionning
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-desktop
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Mi
Deployment with 3 replicated pods writing on the same PVC.
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox
spec:
replicas: 3
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: busybox
image: library/busybox:stable
command: ["/bin/sh"]
args:
["-c", 'while true; do echo "1: $(hostname)" >> /root/index.html; sleep 2; done;',]
volumeMounts:
- mountPath: /root
name: vol-desktop
volumes:
- name: vol-desktop
persistentVolumeClaim:
claimName: pvc-desktop
Nginx server for serving volume content
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:stable
volumeMounts:
- mountPath: /usr/share/nginx/html
name: vol-desktop
ports:
- containerPort: 80
volumes:
- name: vol-desktop
persistentVolumeClaim:
claimName: pvc-desktop
Following what I understood on the documentation, this could not be possible, but in reality everything run pretty smoothly and my Nginx server displayed the up to date index.html file pretty well.
It actually worked on a single-node cluster and multi-node cluster.
What am I not getting here? Why this thing works?
Is every pod mounting is own host path volume on start?
How can a hostPath storage works between multiple nodes?
EDIT: For the multi-node case, a network folder has been created between the same storage path of each machine this is why everything has been replicated successfully. I didn't understand that the same host path is created on each node with that PVC mounted.
To anyone with the same problem: each node mounting this hostpath PVC will have is own folder created at the PV path.
So without network replication between nodes, only pods of the same node will share the same folder.
This is why it's discouraged on a multi-node cluster due to the unpredictable location of a pod on the cluster.
Thanks!
how to handle shared read-write volume on multiple pods.
Redesign your application to avoid it. It tends to be fragile and difficult to manage multiple writers safely; you depend on both your application correctly performing things like file locking, the underlying shared filesystem implementation handling things properly, and the system being tolerant of any sort of network hiccup that might happen.
The example you give is something that frequently appears in Docker Compose setups: have an application with a mix of backend code and static files, and then try to publish the static files at runtime through a volume to a reverse proxy. Instead, you can build an image that copies the static files at build time:
FROM nginx
ARG app_version=latest
COPY --from=my/app:${app_version} /app/static /usr/share/nginx/html
Have your CI system build this and push it immediately after the backend image is built. The resulting image serves the corresponding static files, but doesn't require a shared volume or any manual management of the volume contents.
For other types of content, consider storing data in a database, or use an object-storage service that maintains its own backing store and can handle the concurrency considerations. Then most of your pods can be totally stateless, and you can manage the data separately (maybe even outside Kubernetes).
How can a hostPath storage works between multiple nodes?
It doesn't. It's an instruction to Kubernetes, on whichever node the pod happens to be scheduled on, to mount that host directory into the container. There's no management of any sort of the directory content; if two pods get scheduled on the same node, they'll share the directory, and if not, they won't; and if your pod's Deployment is updated and the pod is deleted and recreated somewhere else, it might not be the same node and might not have the same data.
With some very specific exceptions you shouldn't use hostPath volumes at all. The exceptions are things like log collectors run as DaemonSets, where there is exactly one pod on every node and you're interested in picking up the host-directory content that is different on each node.
In your specific setup either you're getting lucky with where the data producers and consumers are getting colocated, or there's something about your MicroK8s setup that's causing the host directories to be shared. It is not in general reliable storage.

Best practice on reading and writing to a persistent Volume in a live Kubernetes Cluster

I am designing a Kubernetes system which will require storing audio files. To do this I would like to setup a persistent storage volume making use of a stateful set.
I have found a few tutorials on how to set something like this up, but I am unsure once I have created it how to read/write the files. What would be the best approach to do this. I will be using a flask app, but if I could just get a high level approach then I can find the exact libraries myself.
Not acknowledging on facts how it should be implemented programming wise and the specific tuning for dealing with audio files, you can use your Persistent Volume the same as you would read/write data to a directory (as correctly pointed by user #zerkms in the comments).
Answering this specific part of the question:
but I am unsure once I have created it how to read/write the files.
Assuming that you've created your StatefulSet in a following way:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: ubuntu-sts
spec:
selector:
matchLabels:
app: ubuntu # has to match .spec.template.metadata.labels
serviceName: "ubuntu"
replicas: 1
template:
metadata:
labels:
app: ubuntu
spec:
terminationGracePeriodSeconds: 10
containers:
- name: ubuntu
image: ubuntu
command:
- sleep
- "infinity"
volumeMounts:
- name: audio-volume
mountPath: /audio
volumeClaimTemplates:
- metadata:
name: audio-volume
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: standard
resources:
requests:
storage: 1Gi
Take a look on below part (it's showing where your Volume will be mounted):
volumeMounts:
- name: audio-volume
mountPath: /audio # <-- HERE!
Disclaimer!
This example is having the 1:1 Pod to Volume relation. If your use case is different you will need to refer to the Kubernetes documentation about accessModes.
You can exec into this Pod to look how you can further develop your application:
$ kubectl exec -it ubuntu-sts-0 -- /bin/bash
$ echo "Hello from your /audio directory!" > /audio/hello.txt
$ cat /audio/hello.txt
root#ubuntu-sts-0:/# cat /audio/hello.txt
Hello from your /audio directory!
A side note!
If it happens that you are using the cloud-provider managed Kubernetes cluster like GKE, EKS or AKS, please refer to it's documentation about storage options.
I encourage you to check the official documentation on Persistent Volumes:
Kubernetes.io: Docs: Concepts: Storage: Persistent Volumes
Also, please take a look on documentation regarding Statefulset:
Kubernetes.io: Docs: Concepts: Workloads: Controllers: Statefulset
Additional resources:
Guru99.com: Reading and writing files in Python

How to store my pod logs in a persistent storage?

I have generated logs for my pods using kubectl logs 'pod name. But I want to persist these logs in a volume (some kind of persistent storage), because container logs will get wiped out if the pods go down. Is there a way to do this? Do I have to write some sort of a script?
I have read many answers but I still do not understand how to go about it, any help is appreciated. Thanks!
Under Logging Architecture Kubernetes documents goes thru couple of way to set up loggin in your cluster.
The most interesting for you might be Cluster-level logging architecture:
While Kubernetes does not provide a native solution for cluster-level
logging, there are several common approaches you can consider. Here
are some options:
Use a node-level logging agent that runs on every node.
Include a dedicated sidecar container for logging in an application pod.
Push logs directly to a backend from within an application
There are many solutions for collecting pod logs and shipping them to a centralized location such as:
fluentd
splunk
elastic
Keeping logs outside of cluster has benefits. If you cluster begins to have issues its more likely that your inside logging architecure will also face them.
You will need to mount the logs directory inside the container to the host machine as well, using the PersistentVolume and PersistentVolumeClaim.
This way you can persist these logs even if the container is killed.
Create the PersistentVolume and PersistentVolumeClaim for the log path and use them as volume mounts to the kubernetes deployments or pods.
I know this is an old question, but I've just had the same problem and I've spent some time to figure out the solution, so I'd like to share a more detailed solution.
Like Aayush Mall said, you'll need the PersistentVolume and PersistentVolumeClaim objects to create the volume and then link it to the pod (preferably via a Deployment object).
Basically, The PersistentVolume would define how and where the volume would be stored in the host and the PersistentVolumeClaim would define the constraints to bind the volume to some container.
From the docs:
A PersistentVolume (PV) is a piece of storage in the cluster that has been provisioned by an administrator or dynamically provisioned using Storage Classes. It is a resource in the cluster just like a node is a cluster resource. PVs are volume plugins like Volumes, but have a lifecycle independent of any individual Pod that uses the PV. This API object captures the details of the implementation of the storage, be that NFS, iSCSI, or a cloud-provider-specific storage system.
A PersistentVolumeClaim (PVC) is a request for storage by a user. It is similar to a Pod. Pods consume node resources and PVCs consume PV resources. Pods can request specific levels of resources (CPU and Memory). Claims can request specific size and access modes (e.g., they can be mounted ReadWriteOnce, ReadOnlyMany or ReadWriteMany, see AccessModes).
So, let's say your pods are running in two nodes: mynode-1 and mynode-2.
Your PersistentVolume spec will look like this.
apiVersion: v1
kind: PersistentVolume
metadata:
name: myapp-log-pv
spec:
capacity:
storage: 10Gi
volumeMode: Filesystem
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: local-storage
local:
path: /var/log/myapp
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/hostname
operator: In
values:
- mynode-1
- mynode-2
Your PersistentVolumeClaim like this.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: myapp-log-pvc
spec:
volumeMode: Filesystem
accessModes:
- ReadWriteMany
storageClassName: local-storage
resources:
requests:
storage: 2Gi
volumeName: myapp-log
And then, you just have to tell the deployment object how to mount the volume inside the container. So, your Deployment spec will look like this.
apiVersion: apps/v1
kind: Deployment
metadata:
name: myapp-deploy
spec:
selector:
matchLabels:
app: myapp
replicas: 1
template:
metadata:
labels:
app: myapp
spec:
containers:
- name: myapp
image: myrepo/myapp:latest
volumeMounts:
- name: log
mountPath: /var/log
volumes:
- name: log
persistentVolumeClaim:
claimName: myapp-log-pvc
And that's it. When your deployment starts, it'll create the pod with the container, mount a volume named log for the path /var/log (inside the container) and bound this volume to some PV matching the requirements of the PVC named myapp-log-pvc. As we've created the myapp-log-pv with the same volumeMode, accessModes and storageClassName fields and with more storage capacity then the required by myapp-log-pvc, they will be bound. So, your app logs will be stored in the path /var/log/myapp (field spec.local.path in the myapp-log-pv spec) inside the node running the pod.
I hope it help :)
Also, I'm kinda new in the kubernetes world, so please let me know if you notice I misunderstood something or if there is a better way to do this.

How do I mount and format new google compute disk to be mounted in a GKE pod?

I have created a new disk in Google Compute Engine.
gcloud compute disks create --size=10GB --zone=us-central1-a dane-disk
It says I need to format it. But I have no idea how could I mount/access the disk?
gcloud compute disks list
NAME LOCATION LOCATION_SCOPE SIZE_GB TYPE STATUS
notowania-disk us-central1-a zone 10 pd-standard READY
New disks are unformatted. You must format and mount a disk before it
can be used. You can find instructions on how to do this at:
https://cloud.google.com/compute/docs/disks/add-persistent-disk#formatting
I tried instruction above but lsblk is not showing the disk at all
Do I need to create a VM and somehow attach it to it in order to use it? My goal was to mount the disk as a persistent GKE volume independent of the VM (last time GKE upgrade caused recreation of VM and data loss)
Thanks for the clarification of what you are trying to do in the comments.
I have 2 different answers here.
The first is that my testing shows that the Kubernetes GCE PD documentation is exactly right, and the warning about formatting seems like it can be safely ignored.
If you just issue:
gcloud compute disks create --size=10GB --zone=us-central1-a my-test-data-disk
And then use it in a pod:
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: nginx
name: nginx-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
# This GCE PD must already exist.
gcePersistentDisk:
pdName: my-test-data-disk
fsType: ext4
It will be formatted when it is mounted. This is likely because the fsType parameter instructs the system how to format the disk. You don't need to do anything with a separate GCE instance. The disk is retained even if you delete the pod or even the entire cluster. It is not reformatted on uses after the first and the data is kept around.
So, the warning message from gcloud is confusing, but can be safely ignored in this case.
Now, in order to dynamically create a persistent volume based on GCE PD that isn't automatically deleted, you will need to create a new StorageClass that sets the Reclaim Policy to Retain, and then create a PersistentVolumeClaim based on that StorageClass. This also keeps basically the entire operation inside of Kubernetes, without needing to do anything with gcloud. Likewise, a similar approach is what you would want to use with a StatefulSet as opposed to a single pod, as described here.
Most of what you are looking to do is described in this GKE documentation about dynamically allocating PVCs as well as the Kubernetes StorageClass documentation. Here's an example:
gce-pd-retain-storageclass.yaml:
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: gce-pd-retained
reclaimPolicy: Retain
provisioner: kubernetes.io/gce-pd
parameters:
type: pd-standard
replication-type: none
The above storage class is basically the same as the 'standard' GKE storage class, except with the reclaimPolicy set to Retain.
pvc-demo.yaml:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-demo-disk
spec:
accessModes:
- ReadWriteOnce
storageClassName: gce-pd-retained
resources:
requests:
storage: 10Gi
Applying the above will dynamically create a disk that will be retained when you delete the claim.
And finally a demo-pod.yaml that mounts the PVC as a volume (this is really a nonsense example using nginx, but it demonstrates the syntax):
apiVersion: v1
kind: Pod
metadata:
name: test-pd
spec:
containers:
- image: nginx
name: nginx-container
volumeMounts:
- mountPath: /test-pd
name: test-volume
volumes:
- name: test-volume
persistentVolumeClaim:
claimName: pvc-demo-disk
Now, if you apply these three in order, you'll get a container running using the PersistentVolumeClaim which has automatically created (and formatted) a disk for you. When you delete the pod, the claim keeps the disk around. If you delete the claim the StorageClass still keeps the disk from being deleted.
Note that the PV that is left around after this won't be automatically reused, as the data is still on the disk. See the Kubernetes documentation about what you can do to reclaim it in this case. Really, this mostly says that you shouldn't delete the PVC unless you're ready to do work to move the data off the old volume.
Note that these disks will even continue to exist when the entire GKE cluster is deleted as well (and you will continue to be billed for them until you delete them).