Mount a shared volume to Kubernetes cluster so that all users can access same storage and share files - kubernetes

I am following Zero to JupyterHub with Kubernetes to create a jupyterHub environment for my team to use.
Using Google Kubernetes Engine and every user gets his/her own storage and files are stored - this setup works fine.
I am having trouble as how should I create a volume or shared database so that everyone in team can see each other's notebooks, share file's and data.
To explain more, in desired setup - when a user signs in and goes to his/her jupyter image - every user sees the same folder "shared" and one can create individual folders for themselves inside that folder but are able to reuse code that someone else has already written.
I looked into NFS with Firestore but that seems very expensive.

As in the documentation gcePersistenceDisk do not support multiple read and write.
There is alternative solution for the problem. Rook is a storage backend various storage provisioner available through it. One of them is Ceph which has shared filesystem solution on kubernetes.

Related

list buckets and their content using ofek.dev/csi-gcs

I deployed the static CSI https://ofek.dev/csi-gcs for google cloud
The writer works fine: anything written by the pod in the mounting directory shows up un the bucket.
However the reader only allows to access the data written by the pod but it cannot list the buckets and objects that have not been created by the pod itself.
I implemented the Static provisioning step by step as explained in the link.
By the way: if I dir a bucket that I know it exists, does not solve the problem.
And I also gave GSA role/editor so it should be able to list the buckets.

Without retention policy or lifecycle rules, would Google Cloud Storage automatically delete files?

My app uses Google Cloud Storage through Firebase with Java, Angular & Flutter. It stores pictures and such there. Now, a lot of older files recently disappeared from Google Cloud Storage. A test version of my app is probably the culprit. But I want to make sure that I got the storage bucket configured correctly.
Please note that I don't have object versioning enabled. From what I know, it would keep a copy of deleted files around. That's why I plan to enable it in the future. But it doesn't help me with files deleted in the past.
Right now, my storage bucket is configured as follows:
Default storage class: Standard
Object versioning: Off
Retention policy: None
Lifecycle rules: None
So with that configuration, would Google Cloud Storage automatically delete files? Like, say, after a year or so?
No. If you don't ask Cloud Storage to delete your files, your files will stay around forever. There's no expiration of any sort by default. Cloud Storage is a popular tool for long term storage/backup/retention.
If you want to be especially careful not to delete certain objects, retention policies and object holds can be used to make it harder to delete objects by accident. For example, if you wanted to temporarily ensure that your scripts would not delete your most important object, you could run:
gsutil retention temp set gs://my_bucket_name/my_important_file.txt
This would set a "temporary object hold" on the object, which would make it so that my_important_file.txt could not be deleted with the delete command until you released the hold.

How to manage shared notebooks permissions in zero 2 jupyterhub deployment (kubernetes)?

Im running jupyterhub on kubernetes using the zero 2 jupyterhub chart.
Currently i'v got persistent volume that is mounted on each single user server and is used as a shared folder. all users can read from and write to it.
The desired state is to have shared folder that will contain all user's folders in a way that each user has write permissions to his folder only.
Problems are:
All single user pods are run with the same user (jovian) and with the same group
When changing permissions to sub folders of the shared/ folder, permissions are changed in all mounted instances (eg. in all single user pods)
Is there any way to manage these permissions without managing linux users in parallel to jupyterhub user? if there isn't, are there any convenient ways (maybe some open source custom jupyterhub authenticator/single user image)that helps with it?

Copying directories into minikube and persisting them

I am trying to copy some directories into the minikube VM to be used by some of the pods that are running. These include API credential files and template files used at run time by the application. I have found you can copy files using scp into the /home/docker/ directory, however these files are not persisted over reboots of the VM. I have read files/directories are persisted if stored in the /data/ directory on the VM (among others) however I get permission denied when trying to copy files to these directories.
Are there:
A: Any directories in minikube that will persist data that aren't protected in this way
B: Any other ways of doing the above without running into this issue (could well be going about this the wrong way)
To clarify, I have already been able to mount the files from /home/docker/ into the pods using volumes, so it's just the persisting data I'm unclear about.
Kubernetes has dedicated object types for these sorts of things. API credential files you might store in a Secret, and template files (if they aren't already built into your Docker image) could go into a ConfigMap. Both of them can either get translated to environment variables or mounted as artificial volumes in running containers.
In my experience, trying to store data directly on a node isn't a good practice. It's common enough to have multiple nodes, to not directly have login access to those nodes, and for them to be created and destroyed outside of your direct control (imagine an autoscaler running on a cloud provider that creates a new node when all of the existing nodes are 90% scheduled). There's a good chance your data won't (or can't) be on the host where you expect it.
This does lead to a proliferation of Kubernetes objects and associated resources, and you might find a Helm chart to be a good resource to tie them together. You can check the chart into source control along with your application, and deploy the whole thing in one shot. While it has a couple of useful features beyond just packaging resources together (a deploy-time configuration system, a templating language for the Kubernetes YAML itself) you can ignore these if you don't need them and just write a bunch of YAML files and a small control file.
For minikube, data kept in $HOME/.minikube/files directory is copied to / directory in VM host by minikube.

Updating Web Role applications (Azure) without deleting user data

I've got a Web Role on Azure with 2 Applications and 1 Virtual Directory.
1 Application is a backend, where admins can upload files, which are stored in the virtual directory (which is accessed by both applications).
Everytime I deploy a new version to Azure, all the uploaded content in the virtual directory is deleted - this is what I don't want!
So how is it possible to publish a new version without deleting all my user generated files?
I've already managed to update the application with WebDeploy. But this is only possible for the "main" application, and not the 2nd application (which is configured as a Virtual Application).
Thanks
You can't. The web role is recreated on deployment. It may also occur on hardware failure, azure redeploys your system if an instance fails. Redeploys a clean virtual machine and then deploys your app to it. You should never store data you want to keep on a web role. You need to use blob storage etc to store files you want to persist.
Virtual directories are stored on "Application" partition which is recreated on each upgrade - see this for more information. So the virtual directory folder is not the right place to store stuff you want preserved across upgrades. BTW the "Application" partition only has 1 gigabyte of space and some of that is used for storing your application binary code so you may find yourself in a "disk full" situation at some moment.
If you want to store some data which you don't mind sacrificing on rare occasions - like cached results - you may use "local resources" disk for that which will survive in-place upgrades and reboots. However it is not guaranteed to be preserved if your VM crashes - for such level of preservation you have to use persistent storage like blob storage for example.
Since you are talking about virtual directories and using web deploy to update application outside of the usual Azure package deployment mechanism, it sounds like your architecture/application might be more suited to a persistent VM role rather than a Web role. These are available on Azure in preview only at the moment.
http://www.windowsazure.com/en-us/home/scenarios/virtual-machines/
They let you have persistent storage that will survive a recycle. The storage is actually backed by blob storage, but it looks like a normal disk from the PVM.